Autor | SHA1 | Mensagem | Data |
---|---|---|---|
朱文韬 | fa1f4cae4a | mm malloc v1.2 - 88 | 2 anos atrás |
朱文韬 | 18bc06645d | mm malloc v1.1 - 66 | 2 anos atrás |
朱文韬 | f8b538c213 | mm malloc v1 - 66 | 2 anos atrás |
朱文韬 | 19a2df49ef | add malloc lab | 2 anos atrás |
朱文韬 | ecdf88cb98 | feat: finish Bomb Lab | 2 anos atrás |
@ -0,0 +1,452 @@ | |||
####################################################### | |||
# CS:APP Bomb Lab | |||
# Directions to Instructors | |||
# | |||
# Copyright (c) 2003-2016, R. Bryant and D. O'Hallaron | |||
# | |||
####################################################### | |||
This directory contains the files that you will use to build and run | |||
the CS:APP Bomb Lab. The Bomb Lab teaches students principles of | |||
machine-level programs, as well as general debugger and reverse | |||
engineering skills. | |||
*********** | |||
1. Overview | |||
*********** | |||
---- | |||
1.1. Binary Bombs | |||
---- | |||
A "binary bomb" is a Linux executable C program that consists of six | |||
"phases." Each phase expects the student to enter a particular string | |||
on stdin. If the student enters the expected string, then that phase | |||
is "defused." Otherwise the bomb "explodes" by printing "BOOM!!!". | |||
The goal for the students is to defuse as many phases as possible. | |||
---- | |||
1.2. Solving Binary Bombs | |||
---- | |||
In order to defuse the bomb, students must use a debugger, typically | |||
gdb or ddd, to disassemble the binary and single-step through the | |||
machine code in each phase. The idea is to understand what each | |||
assembly statement does, and then use this knowledge to infer the | |||
defusing string. Students earn points for defusing phases, and they | |||
lose points (configurable by the instructor, but typically 1/2 point) | |||
for each explosion. Thus, they quickly learn to set breakpoints before | |||
each phase and the function that explodes the bomb. It's a great | |||
lesson and forces them to learn to use a debugger. | |||
---- | |||
1.3. Autograding Service | |||
---- | |||
We have created a stand-alone user-level autograding service that | |||
handles all aspects of the Bomb Lab for you: Students download their | |||
bombs from a server. As the students work on their bombs, each | |||
explosion and defusion is streamed back to the server, where the | |||
current results for each bomb are displayed on a Web "scoreboard." | |||
There are no explicit handins and the lab is self-grading. | |||
The autograding service consists of four user-level programs that run | |||
in the main ./bomblab directory: | |||
- Request Server (bomblab-requestd.pl). Students download their bombs | |||
and display the scoreboard by pointing a browser at a simple HTTP | |||
server called the "request server." The request server builds the | |||
bomb, archives it in a tar file, and then uploads the resulting tar | |||
file back to the browser, where it can be saved on disk and | |||
untarred. The request server also creates a copy of the bomb and its | |||
solution for the instructor. | |||
- Result Server (bomblab-resultd.pl). Each time a student defuses a | |||
bomb phase or causes an explosion, the bomb sends a short HTTP | |||
message, called an "autoresult string," to an HTTP "result server," | |||
which simply appends the autoresult string to a "scoreboard log file." | |||
- Report Daemon (bomblab-reportd.pl). The "report daemon" periodically | |||
scans the scoreboard log file. The report daemon finds the most recent | |||
defusing string submitted by each student for each phase, and | |||
validates these strings by applying them to a local copy of the | |||
student's bomb. It then updates the HTML scoreboard that summarizes | |||
the current number of explosions and defusions for each bomb, rank | |||
ordered by the total number of accrued points. | |||
- Main daemon (bomblab.pl). The "main daemon" starts and nannies the | |||
request server, result server, and report deamon, ensuring that | |||
exactly one of these processes (and itself) is running at any point in | |||
time. If one of these processes dies for some reason, the main daemon | |||
detects this and automatically restarts it. The main daemon is the | |||
only program you actually need to run. | |||
******** | |||
2. Files | |||
******** | |||
The ./bomblab directory contains the following files: | |||
Makefile - For starting/stopping the lab and cleaning files | |||
bomblab.pl* - Main daemon that nannies the other servers & daemons | |||
Bomblab.pm - Bomblab configuration file | |||
bomblab-reportd.pl* - Report daemon that continuously updates scoreboard | |||
bomblab-requestd.pl* - Request server that serves bombs to students | |||
bomblab-resultd.pl* - Result server that gets autoresult strings from bombs | |||
bomblab-scoreboard.html - Real-time Web scoreboard | |||
bomblab-update.pl* - Helper to bomblab-reportd.pl that updates scoreboard | |||
bombs/ - Contains the bombs sent to each student | |||
log-status.txt - Status log with msgs from various servers and daemons | |||
log.txt - Scoreboard log of autoresults received from bombs | |||
makebomb.pl* - Helper script that builds a bomb | |||
scores.txt - Summarizes current scoreboard scores for each student | |||
src/ - The bomb source files | |||
writeup/ - Sample Latex Bomb Lab writeup | |||
******************* | |||
3. Bomb Terminology | |||
******************* | |||
LabID: Each instance (offering) of the lab is identified by a unique | |||
name, e.g., "f12" or "s13", that the instructor chooses. Explosion and | |||
diffusions from bombs whose LabIDs are different from the current | |||
LabID are ignored. The LabID must not have any spaces. | |||
BombID: Each bomb in a given instance of the lab has a unique | |||
non-negative integer called the "bombID." | |||
Notifying Bomb: A bomb can be compiled with a NOTIFY option that | |||
causes the bomb to send a message each time the student explodes or | |||
defuses a phase. Such bombs are called "notifying bombs." | |||
Quiet Bomb: If compiled with the NONOTIFY option, then the bomb | |||
doesn't send any messages when it explodes or is defused. Such bombs | |||
are called "quiet bombs." | |||
We will also find it helpful to distinguish between custom and | |||
generic bombs: | |||
Custom Bomb: A "custom bomb" has a BombID > 0, is associated with a | |||
particular student, and can be either notifying or quiet. Custom | |||
notifying bombs are constrained to run on a specific set of Linux | |||
hosts determined by the instructor. On the other hand, custom quiet | |||
bombs can run on any Linux host. | |||
Generic Bomb: A "generic bomb" has a BombID = 0, isn't associated with | |||
any particular student, is quiet, and hence can run on any host. | |||
************************ | |||
4. Offering the Bomb Lab | |||
************************ | |||
There are two basic flavors of Bomb Lab: In the "online" version, the | |||
instructor uses the autograding service to handout a custom notifying | |||
bomb to each student on demand, and to automatically track their | |||
progress on the realtime scoreboard. In the "offline" version, the | |||
instructor builds, hands out, and grades the student bombs manually, | |||
without using the autograding service. | |||
While both version give the students a rich experience, we recommend | |||
the online version. It is clearly the most compelling and fun for the | |||
students, and the easiest for the instructor to grade. However, it | |||
requires that you keep the autograding service running non-stop, | |||
because handouts, grading, and reporting occur continuously for the | |||
duration of the lab. We've made it very easy to run the service, but | |||
some instructors may be uncomfortable with this requirement and will | |||
opt instead for the offline version. | |||
Here are the directions for offering both versions of the lab. | |||
--- | |||
4.1. Create a Bomb Lab Directory | |||
--- | |||
Identify the generic Linux machine ($SERVER_NAME) where you will | |||
create the Bomb Lab directory (./bomblab) and, if you are offering the | |||
online version, run the autograding service. You'll only need to have | |||
a user account on this machine. You don't need root access. | |||
Each offering of the Bomb Lab starts with a clean new ./bomblab | |||
directory on $SERVER_NAME. For example: | |||
linux> tar xvf bomblab.tar | |||
linux> cd bomblab | |||
linux> make cleanallfiles | |||
--- | |||
4.2 Configure the Bomb Lab | |||
--- | |||
Configure the Bomb Lab by editing the following file: | |||
./Bomblab.pm - This is the main configuration file. You will only need | |||
to modify or inspect a few variables in Section 1 of this file. Each | |||
variable is preceded by a descriptive comment. If you are offering the | |||
offline version, you can ignore most of these settings. | |||
If you are offering the online version, you will also need to edit the | |||
following file: | |||
./src/config.h - This file lists the domain names of the hosts that | |||
notifying bombs are allowed to run on. Make sure you update this | |||
correctly, else you and your students won't be able to run your bombs. | |||
---- | |||
4.3. Update the Lab Writeup | |||
--- | |||
Once you have updated the configuration files, modify the Latex lab | |||
writeup in ./writeup/bomblab.tex for your environment. Then type the | |||
following in the ./writeup directory: | |||
unix> make clean | |||
unix> make | |||
This will create ps and pdf versions of the writeup | |||
--- | |||
4.4. Running the Online Bomb Lab | |||
--- | |||
------ | |||
4.4.1. Short Version | |||
------ | |||
From the ./bomblab directory: | |||
(1) Reset the Bomb Lab from scratch by typing | |||
linux> make cleanallfiles | |||
(2) Start the autograding service by typing | |||
linux> make start | |||
(3) Stop the autograding service by typing | |||
linux> make stop | |||
You can start and stop the autograding service as often as you like | |||
without losing any information. When in doubt "make stop; make start" | |||
will get everything in a stable state. | |||
However, resetting the lab deletes all old bombs, status logs, and the | |||
scoreboard log. Do this only during debugging, or the very first time | |||
you start the lab for your students. | |||
Students request bombs by pointing their browsers at | |||
http://$SERVER_NAME:$REQUESTD_PORT/ | |||
Students view the scoreboard by pointing their browsers at | |||
http://$SERVER_NAME:$REQUESTD_PORT/scoreboard | |||
------ | |||
4.4.2. Long Version | |||
------ | |||
(1) Resetting the Bomb Lab. "make stop" ensures that there are no | |||
servers running. "make cleanallfiles" resets the lab from scratch, | |||
deleting all data specific to a particular instance of the lab, such | |||
as the status log, all bombs created by the request server, and the | |||
scoreboard log. Do this when you're ready for the lab to go "live" to | |||
the students. | |||
Resetting is also useful while you're preparing the lab. Before the | |||
lab goes live, you'll want to request a few bombs for yourself, run | |||
them, defuse a few phases, explode a few phases, and make sure that | |||
the results are displayed properly on the scoreboard. If there is a | |||
problem (say because you forgot to update the list of machines the | |||
bombs are allowed to run in src/config.h) you can fix the | |||
configuration, reset the lab, and then request and run more test | |||
bombs. | |||
CAUTION: If you reset the lab after it's live, you'll lose all your | |||
records of the students bombs and their solutions. You won't be able | |||
to validate the students handins. And your students will have to get | |||
new bombs and start over. | |||
(2) Starting the Bomb Lab. "make start" runs bomblab.pl, the main | |||
daemon that starts and nannies the other programs in the service, | |||
checking their status every few seconds and restarting them if | |||
necessary: | |||
(3) Stopping the Bomb Lab. "make stop" kills all of the running | |||
servers. You can start and stop the autograding service as often as | |||
you like without losing any information. When in doubt "make stop; | |||
make start" will get everything in a stable state. | |||
Request Server: The request server is a simple special-purpose HTTP | |||
server that (1) builds and delivers custom bombs to student browsers | |||
on demand, and (2) displays the current state of the real-time | |||
scoreboard. | |||
A student requests a bomb from the request daemon in two | |||
steps: First, the student points their favorite browser at | |||
http://$SERVER_NAME:$REQUESTD_PORT/ | |||
For example, http://foo.cs.cmu.edu:15213/. The request server | |||
responds by sending an HTML form back to the browser. Next, the | |||
student fills in this form with their user name and email address, and | |||
then submits the form. The request server parses the form, builds and | |||
tars up a notifying custom bomb with bombID=n, and delivers the tar | |||
file to the browser. The student then saves the tar file to disk. When | |||
the student untars this file, it creates a directory (./bomb<n>) with | |||
the following four files: | |||
bomb* Notifying custom bomb executable | |||
bomb.c Source code for the main bomb routine | |||
ID Identifies the student associated with this bomb | |||
README Lists bomb number, student, and email address | |||
The request server also creates a directory (bomblab/bombs/bomb<n>) | |||
that contains the following files: | |||
bomb* Custom bomb executable | |||
bomb.c Source code for main routine | |||
bomb-quiet* A quiet version of bomb used for autograding | |||
ID Identifies the user name assigned to this bomb | |||
phases.c C source code for the bomb phases | |||
README Lists bombID, user name, and email address | |||
solution.txt The solution for this bomb | |||
Result Server: Each time a student defuses a phase or explodes their | |||
bomb, the bomb sends an HTTP message (called an autoresult string) to | |||
the result server, which then appends the message to the scoreboard | |||
log. Each message contains a BombID, a phase, and an indication of the | |||
event that occurred. If the event was a defusion, the message also | |||
contains the "defusing string" that the student typed to defuse the | |||
phase. | |||
Report Daemon: The report daemon periodically scans the scoreboard log | |||
and updates the Web scoreboard. For each bomb, it tallies the number | |||
of explosions, the last defused phase, validates each last defused | |||
phase using a quiet copy of the bomb, and computes a score for each | |||
student in a tab delimited text file called "scores.txt." The update | |||
frequency is a configuration variable in Bomblab.pm. | |||
Instructors and students view the scoreboard by pointing their | |||
browsers at: | |||
http://$SERVER_NAME:$REQUESTD_PORT/scoreboard | |||
------ | |||
4.4.3. Grading the Online Bomb Lab | |||
------ | |||
The online Bomb Lab is self-grading. At any point in time, the | |||
tab-delimited file (./bomblab/scores.txt) contains the most recent | |||
scores for each student. This file is created by the report daemon | |||
each time it generates a new scoreboard. | |||
------ | |||
4.4.4. Additional Notes on the Online Bomb Lab | |||
------ | |||
* Since the request server and report daemon both need to execute | |||
bombs, you must include $SERVER_NAME in the list of legal machines in | |||
your bomblab/src/config.h file. | |||
* All of the servers and daemons are stateless, so you can stop ("make | |||
stop") and start ("make start") the lab as many times as you like | |||
without any ill effects. If you accidentally kill one of the daemons, | |||
or you modify a daemon, or the daemon dies for some reason, then use | |||
"make stop" to clean up, and then restart with "make start". If your | |||
Linux box crashes or reboots, simply restart the daemons with "make | |||
start". | |||
* Information and error messages from the servers are appended to the | |||
"status log" in bomblab/log-status.txt. Servers run quietly, so they | |||
can be started from initrc scripts at boot time. | |||
* See src/README for more information about the anatomy of bombs and | |||
how they are constructed. You don't need to understand any of this to | |||
offer the lab. It's provided only for completeness. | |||
* Before going live with the students, we like to check everything out | |||
by running some tests. We do this by typing | |||
linux> make cleanallfiles | |||
linux> make start | |||
Then we request a bomb for ourselves by pointing a Web browser at | |||
http://$SERVER_NAME:$REQUESTD_PORT | |||
After saving our bomb to disk, we untar it, copy it to a host in the | |||
approved list in src/config.h, and then explode and defuse it a couple | |||
of times to make sure that the explosions and diffusion are properly | |||
recorded on the scoreboard, which we check at | |||
http://$SERVER_NAME:$REQUESTD_PORT/scoreboard | |||
Once we're satisfied that everything is OK, we stop the lab | |||
linux> make stop | |||
and then go live: | |||
linux> make cleanallfiles | |||
linux> make start | |||
Once we go live, we type "make stop" and "make start" as often as we | |||
need to, but we are careful never to type "make cleanallfiles" again. | |||
---- | |||
4.5. Running the Offline Bomb Lab | |||
---- | |||
In this version of the lab, you build your own quiet bombs manually | |||
and then hand them out to the students. The students work on defusing | |||
their bombs offline (i.e., independently of any autograding service) | |||
and then handin their solution files to you, each of which you grade | |||
manually. | |||
You can use the makebomb.pl script to build your own bombs | |||
manually. The makebomb.pl script also generates the bomb's solution. | |||
Type "./makebomb.pl -h" to see its arguments. | |||
Option 1: The simplest approach for offering the offline Bomb Lab is | |||
to build a single generic bomb that every student attempts to defuse: | |||
linux> ./makebomb.pl -s ./src -b ./bombs | |||
This will create a generic bomb and some other files in ./bombs/bomb0: | |||
bomb* Generic bomb executable (handout to students) | |||
bomb.c Source code for main routine (handout to students) | |||
bomb-quiet* Ignore this | |||
ID Ignore this | |||
phases.c C source code for the bomb phases | |||
README Ignore this | |||
solution.txt The solution for this bomb | |||
You will handout only two of these files to the students: ./bomb and ./bomb.c | |||
The students will handin their solution files, which you can validate | |||
by feeding to the bomb: | |||
linux> cd bombs/bomb0 | |||
linux> ./bomb < student_solution.txt | |||
This option is easy for the instructor, but we don't recommend it | |||
because it is too easy for the students to cheat. | |||
Option 2. The other option for offering an offline lab is to use the | |||
makebomb.pl script to build a unique quiet custom bomb for each | |||
student: | |||
linux> ./makebomb.pl -i <n> -s ./src -b ./bombs -l bomblab -u <email> -v <uid> | |||
This will create a quiet custom bomb in ./bombs/bomb<n> for the | |||
student whose email address is <email> and whose user name is <uid>: | |||
bomb* Custom bomb executable (handout to student) | |||
bomb.c Source code for main routine (handout to student) | |||
bomb-quiet* Ignore this | |||
ID Identifies the student associated with this bomb | |||
phases.c C source code for the bomb phases | |||
README Lists bomb number, student, and email address | |||
solution.txt The solution for this bomb | |||
You will handout four of these files to the student: bomb, bomb.c, ID, | |||
and README. | |||
Each student will hand in their solution file, which you can validate | |||
by hand by running their custom bomb against their solution: | |||
linux> cd ./bombs/bomb<n> | |||
linux> ./bomb < student_n_solution.txt | |||
The source code for the different phase variants is in ./src/phases/. | |||
@ -0,0 +1,6 @@ | |||
Border relations with Canada have never been better. | |||
1 2 4 8 16 32 | |||
1 311 | |||
7 0 | |||
9?>567 | |||
4 3 2 1 6 5 |
@ -0,0 +1,115 @@ | |||
/*************************************************************************** | |||
* Dr. Evil's Insidious Bomb, Version 1.1 | |||
* Copyright 2011, Dr. Evil Incorporated. All rights reserved. | |||
* | |||
* LICENSE: | |||
* | |||
* Dr. Evil Incorporated (the PERPETRATOR) hereby grants you (the | |||
* VICTIM) explicit permission to use this bomb (the BOMB). This is a | |||
* time limited license, which expires on the death of the VICTIM. | |||
* The PERPETRATOR takes no responsibility for damage, frustration, | |||
* insanity, bug-eyes, carpal-tunnel syndrome, loss of sleep, or other | |||
* harm to the VICTIM. Unless the PERPETRATOR wants to take credit, | |||
* that is. The VICTIM may not distribute this bomb source code to | |||
* any enemies of the PERPETRATOR. No VICTIM may debug, | |||
* reverse-engineer, run "strings" on, decompile, decrypt, or use any | |||
* other technique to gain knowledge of and defuse the BOMB. BOMB | |||
* proof clothing may not be worn when handling this program. The | |||
* PERPETRATOR will not apologize for the PERPETRATOR's poor sense of | |||
* humor. This license is null and void where the BOMB is prohibited | |||
* by law. | |||
***************************************************************************/ | |||
#include <stdio.h> | |||
#include <stdlib.h> | |||
#include "support.h" | |||
#include "phases.h" | |||
/* | |||
* Note to self: Remember to erase this file so my victims will have no | |||
* idea what is going on, and so they will all blow up in a | |||
* spectaculary fiendish explosion. -- Dr. Evil | |||
*/ | |||
FILE *infile; | |||
int main(int argc, char *argv[]) | |||
{ | |||
char *input; | |||
/* Note to self: remember to port this bomb to Windows and put a | |||
* fantastic GUI on it. */ | |||
/* When run with no arguments, the bomb reads its input lines | |||
* from standard input. */ | |||
if (argc == 1) { | |||
infile = stdin; | |||
} | |||
/* When run with one argument <file>, the bomb reads from <file> | |||
* until EOF, and then switches to standard input. Thus, as you | |||
* defuse each phase, you can add its defusing string to <file> and | |||
* avoid having to retype it. */ | |||
else if (argc == 2) { | |||
if (!(infile = fopen(argv[1], "r"))) { | |||
printf("%s: Error: Couldn't open %s\n", argv[0], argv[1]); | |||
exit(8); | |||
} | |||
} | |||
/* You can't call the bomb with more than 1 command line argument. */ | |||
else { | |||
printf("Usage: %s [<input_file>]\n", argv[0]); | |||
exit(8); | |||
} | |||
/* Do all sorts of secret stuff that makes the bomb harder to defuse. */ | |||
initialize_bomb(); | |||
printf("Welcome to my fiendish little bomb. You have 6 phases with\n"); | |||
printf("which to blow yourself up. Have a nice day!\n"); | |||
/* Hmm... Six phases must be more secure than one phase! */ | |||
input = read_line(); /* Get input */ | |||
phase_1(input); /* Run the phase */ | |||
phase_defused(); /* Drat! They figured it out! | |||
* Let me know how they did it. */ | |||
printf("Phase 1 defused. How about the next one?\n"); | |||
/* The second phase is harder. No one will ever figure out | |||
* how to defuse this... */ | |||
input = read_line(); | |||
phase_2(input); | |||
phase_defused(); | |||
printf("That's number 2. Keep going!\n"); | |||
/* I guess this is too easy so far. Some more complex code will | |||
* confuse people. */ | |||
input = read_line(); | |||
phase_3(input); | |||
phase_defused(); | |||
printf("Halfway there!\n"); | |||
/* Oh yeah? Well, how good is your math? Try on this saucy problem! */ | |||
input = read_line(); | |||
phase_4(input); | |||
phase_defused(); | |||
printf("So you got that one. Try this one.\n"); | |||
/* Round and 'round in memory we go, where we stop, the bomb blows! */ | |||
input = read_line(); | |||
phase_5(input); | |||
phase_defused(); | |||
printf("Good work! On to the next...\n"); | |||
/* This phase will never be used, since no one will get past the | |||
* earlier ones. But just in case, make this one extra hard. */ | |||
input = read_line(); | |||
phase_6(input); | |||
phase_defused(); | |||
/* Wow, they got it! But isn't something... missing? Perhaps | |||
* something they overlooked? Mua ha ha ha ha! */ | |||
return 0; | |||
} |
@ -0,0 +1,103 @@ | |||
### 1. 使用 objdump 反编译 bomb | |||
```objdump -d bomb > bomb.s``` | |||
### 2. 查找 main 函数 | |||
发现调用了函数 phase_1, phase_2 等等函数 | |||
猜测这些函数即用来验证输入字符串的正确性 | |||
### 3. 查看函数 phase_1 | |||
发现调用了函数 strings_not_equal | |||
并且在调用之前为 %esi 赋值 0x402400 | |||
说明将其作为参数传进然后作为验证 | |||
猜测其为所需字符串的地址 | |||
### 4. 使用 gdb 查看字符串 | |||
为函数 phase_1 设置断点 ```break phase_1``` | |||
打印地址处字符串 ```print (char *) 0x402400``` | |||
得到 phase1 | |||
### 5. 查看函数 phase_2 | |||
发现调用函数 read_six_numbers | |||
说明输入需要 6 个数字 | |||
查看后续汇编代码发现进行了循环控制 | |||
根据汇编代码得到 6 个数字 | |||
由于输入函数用的是 scanf,故不用考虑转为字符,直接输入 6 个数字 | |||
得到 phase2 | |||
### 6. 查看函数 phase_3 | |||
发现调用了 scanf 函数 | |||
观察参数寄存器 %rcx, %rdx, %rsi | |||
使用 gdb 知需要输入两个数字 | |||
cmpl 得知第一个数字需要小于 0x7 | |||
使用 gdb 观察间接跳转指令 jmpq | |||
指向后面的 switch 控制流 | |||
从而得知输入的两个数字相关联 | |||
即本题多解 | |||
得到其中一个 phase3 | |||
### 7. 查看函数 phase_4 | |||
与 phase_3 同样的输入 | |||
第一个数字小于等于 0xe | |||
发现调用 func4 且返回值需为 0 | |||
观察调用后代码发现第二个数为 0 | |||
观察 func4 发现第一个数需为 7 | |||
得到 phase4 | |||
### 8. 查看函数 phase_5 | |||
需输入字符串 | |||
循环根据输入字符串的 asc 值的第四位索引内置的字符串 | |||
用 gdb 查看最后得到的字符串是 flyers | |||
得到 phase_5 | |||
### 9. 查看函数 phase_6 | |||
调用函数 read_six_numbers 知需输入 6 个数字 | |||
第一个循环说明每个数字需互不相同且小于等于 6 | |||
第二个循环对每个数字作 7 - x | |||
第三个循环根据数字得到 6 个相当于链表的地址 | |||
第四个循环改变链表顺序 | |||
第五个循环说明得到的 6 个地址指向的数字需大于等于下一个地址指向的数字 | |||
得到 phase_6 | |||
**注意:0x30和0x40相差16而不是10!!** |
@ -0,0 +1,30 @@ | |||
# | |||
# Students' Makefile for the Malloc Lab | |||
# | |||
TEAM = bovik | |||
VERSION = 1 | |||
HANDINDIR = /afs/cs.cmu.edu/academic/class/15213-f01/malloclab/handin | |||
CC = gcc | |||
CFLAGS = -Wall -Werror -O3 -g -DDRIVER -std=gnu99 -Wno-unused-function -Wno-unused-parameter -m32 | |||
OBJS = mdriver.o mm.o memlib.o fsecs.o fcyc.o clock.o ftimer.o | |||
mdriver: $(OBJS) | |||
$(CC) $(CFLAGS) -o mdriver $(OBJS) | |||
mdriver.o: mdriver.c fsecs.h fcyc.h clock.h memlib.h config.h mm.h | |||
memlib.o: memlib.c memlib.h | |||
mm.o: mm.c mm.h memlib.h | |||
fsecs.o: fsecs.c fsecs.h config.h | |||
fcyc.o: fcyc.c fcyc.h | |||
ftimer.o: ftimer.c ftimer.h config.h | |||
clock.o: clock.c clock.h | |||
handin: | |||
cp mm.c $(HANDINDIR)/$(TEAM)-$(VERSION)-mm.c | |||
clean: | |||
rm -f *~ *.o mdriver | |||
@ -0,0 +1,52 @@ | |||
##################################################################### | |||
# CS:APP Malloc Lab | |||
# Handout files for students | |||
# | |||
# Copyright (c) 2002, R. Bryant and D. O'Hallaron, All rights reserved. | |||
# May not be used, modified, or copied without permission. | |||
# | |||
###################################################################### | |||
*********** | |||
Main Files: | |||
*********** | |||
mm.{c,h} | |||
Your solution malloc package. mm.c is the file that you | |||
will be handing in, and is the only file you should modify. | |||
mdriver.c | |||
The malloc driver that tests your mm.c file | |||
short{1,2}-bal.rep | |||
Two tiny tracefiles to help you get started. | |||
Makefile | |||
Builds the driver | |||
********************************** | |||
Other support files for the driver | |||
********************************** | |||
config.h Configures the malloc lab driver | |||
fsecs.{c,h} Wrapper function for the different timer packages | |||
clock.{c,h} Routines for accessing the Pentium and Alpha cycle counters | |||
fcyc.{c,h} Timer functions based on cycle counters | |||
ftimer.{c,h} Timer functions based on interval timers and gettimeofday() | |||
memlib.{c,h} Models the heap and sbrk function | |||
******************************* | |||
Building and running the driver | |||
******************************* | |||
To build the driver, type "make" to the shell. | |||
To run the driver on a tiny test trace: | |||
unix> mdriver -V -f short1-bal.rep | |||
The -V option prints out helpful tracing and summary information. | |||
To get a list of the driver flags: | |||
unix> mdriver -h | |||
@ -0,0 +1,279 @@ | |||
/* | |||
* clock.c - Routines for using the cycle counters on x86, | |||
* Alpha, and Sparc boxes. | |||
* | |||
* Copyright (c) 2002, R. Bryant and D. O'Hallaron, All rights reserved. | |||
* May not be used, modified, or copied without permission. | |||
*/ | |||
#include <stdio.h> | |||
#include <stdlib.h> | |||
#include <unistd.h> | |||
#include <sys/times.h> | |||
#include "clock.h" | |||
/******************************************************* | |||
* Machine dependent functions | |||
* | |||
* Note: the constants __i386__ and __alpha | |||
* are set by GCC when it calls the C preprocessor | |||
* You can verify this for yourself using gcc -v. | |||
*******************************************************/ | |||
#if defined(__i386__) | |||
/******************************************************* | |||
* Pentium versions of start_counter() and get_counter() | |||
*******************************************************/ | |||
/* $begin x86cyclecounter */ | |||
/* Initialize the cycle counter */ | |||
static unsigned cyc_hi = 0; | |||
static unsigned cyc_lo = 0; | |||
/* Set *hi and *lo to the high and low order bits of the cycle counter. | |||
Implementation requires assembly code to use the rdtsc instruction. */ | |||
void access_counter(unsigned *hi, unsigned *lo) | |||
{ | |||
asm("rdtsc; movl %%edx,%0; movl %%eax,%1" /* Read cycle counter */ | |||
: "=r" (*hi), "=r" (*lo) /* and move results to */ | |||
: /* No input */ /* the two outputs */ | |||
: "%edx", "%eax"); | |||
} | |||
/* Record the current value of the cycle counter. */ | |||
void start_counter() | |||
{ | |||
access_counter(&cyc_hi, &cyc_lo); | |||
} | |||
/* Return the number of cycles since the last call to start_counter. */ | |||
double get_counter() | |||
{ | |||
unsigned ncyc_hi, ncyc_lo; | |||
unsigned hi, lo, borrow; | |||
double result; | |||
/* Get cycle counter */ | |||
access_counter(&ncyc_hi, &ncyc_lo); | |||
/* Do double precision subtraction */ | |||
lo = ncyc_lo - cyc_lo; | |||
borrow = lo > ncyc_lo; | |||
hi = ncyc_hi - cyc_hi - borrow; | |||
result = (double) hi * (1 << 30) * 4 + lo; | |||
if (result < 0) { | |||
fprintf(stderr, "Error: counter returns neg value: %.0f\n", result); | |||
} | |||
return result; | |||
} | |||
/* $end x86cyclecounter */ | |||
#elif defined(__alpha) | |||
/**************************************************** | |||
* Alpha versions of start_counter() and get_counter() | |||
***************************************************/ | |||
/* Initialize the cycle counter */ | |||
static unsigned cyc_hi = 0; | |||
static unsigned cyc_lo = 0; | |||
/* Use Alpha cycle timer to compute cycles. Then use | |||
measured clock speed to compute seconds | |||
*/ | |||
/* | |||
* counterRoutine is an array of Alpha instructions to access | |||
* the Alpha's processor cycle counter. It uses the rpcc | |||
* instruction to access the counter. This 64 bit register is | |||
* divided into two parts. The lower 32 bits are the cycles | |||
* used by the current process. The upper 32 bits are wall | |||
* clock cycles. These instructions read the counter, and | |||
* convert the lower 32 bits into an unsigned int - this is the | |||
* user space counter value. | |||
* NOTE: The counter has a very limited time span. With a | |||
* 450MhZ clock the counter can time things for about 9 | |||
* seconds. */ | |||
static unsigned int counterRoutine[] = | |||
{ | |||
0x601fc000u, | |||
0x401f0000u, | |||
0x6bfa8001u | |||
}; | |||
/* Cast the above instructions into a function. */ | |||
static unsigned int (*counter)(void)= (void *)counterRoutine; | |||
void start_counter() | |||
{ | |||
/* Get cycle counter */ | |||
cyc_hi = 0; | |||
cyc_lo = counter(); | |||
} | |||
double get_counter() | |||
{ | |||
unsigned ncyc_hi, ncyc_lo; | |||
unsigned hi, lo, borrow; | |||
double result; | |||
ncyc_lo = counter(); | |||
ncyc_hi = 0; | |||
lo = ncyc_lo - cyc_lo; | |||
borrow = lo > ncyc_lo; | |||
hi = ncyc_hi - cyc_hi - borrow; | |||
result = (double) hi * (1 << 30) * 4 + lo; | |||
if (result < 0) { | |||
fprintf(stderr, "Error: Cycle counter returning negative value: %.0f\n", result); | |||
} | |||
return result; | |||
} | |||
#else | |||
/**************************************************************** | |||
* All the other platforms for which we haven't implemented cycle | |||
* counter routines. Newer models of sparcs (v8plus) have cycle | |||
* counters that can be accessed from user programs, but since there | |||
* are still many sparc boxes out there that don't support this, we | |||
* haven't provided a Sparc version here. | |||
***************************************************************/ | |||
void start_counter() | |||
{ | |||
printf("ERROR: You are trying to use a start_counter routine in clock.c\n"); | |||
printf("that has not been implemented yet on this platform.\n"); | |||
printf("Please choose another timing package in config.h.\n"); | |||
exit(1); | |||
} | |||
double get_counter() | |||
{ | |||
printf("ERROR: You are trying to use a get_counter routine in clock.c\n"); | |||
printf("that has not been implemented yet on this platform.\n"); | |||
printf("Please choose another timing package in config.h.\n"); | |||
exit(1); | |||
} | |||
#endif | |||
/******************************* | |||
* Machine-independent functions | |||
******************************/ | |||
double ovhd() | |||
{ | |||
/* Do it twice to eliminate cache effects */ | |||
int i; | |||
double result; | |||
for (i = 0; i < 2; i++) { | |||
start_counter(); | |||
result = get_counter(); | |||
} | |||
return result; | |||
} | |||
/* $begin mhz */ | |||
/* Estimate the clock rate by measuring the cycles that elapse */ | |||
/* while sleeping for sleeptime seconds */ | |||
double mhz_full(int verbose, int sleeptime) | |||
{ | |||
double rate; | |||
start_counter(); | |||
sleep(sleeptime); | |||
rate = get_counter() / (1e6*sleeptime); | |||
if (verbose) | |||
printf("Processor clock rate ~= %.1f MHz\n", rate); | |||
return rate; | |||
} | |||
/* $end mhz */ | |||
/* Version using a default sleeptime */ | |||
double mhz(int verbose) | |||
{ | |||
return mhz_full(verbose, 2); | |||
} | |||
/** Special counters that compensate for timer interrupt overhead */ | |||
static double cyc_per_tick = 0.0; | |||
#define NEVENT 100 | |||
#define THRESHOLD 1000 | |||
#define RECORDTHRESH 3000 | |||
/* Attempt to see how much time is used by timer interrupt */ | |||
static void callibrate(int verbose) | |||
{ | |||
double oldt; | |||
struct tms t; | |||
clock_t oldc; | |||
int e = 0; | |||
times(&t); | |||
oldc = t.tms_utime; | |||
start_counter(); | |||
oldt = get_counter(); | |||
while (e <NEVENT) { | |||
double newt = get_counter(); | |||
if (newt-oldt >= THRESHOLD) { | |||
clock_t newc; | |||
times(&t); | |||
newc = t.tms_utime; | |||
if (newc > oldc) { | |||
double cpt = (newt-oldt)/(newc-oldc); | |||
if ((cyc_per_tick == 0.0 || cyc_per_tick > cpt) && cpt > RECORDTHRESH) | |||
cyc_per_tick = cpt; | |||
/* | |||
if (verbose) | |||
printf("Saw event lasting %.0f cycles and %d ticks. Ratio = %f\n", | |||
newt-oldt, (int) (newc-oldc), cpt); | |||
*/ | |||
e++; | |||
oldc = newc; | |||
} | |||
oldt = newt; | |||
} | |||
} | |||
if (verbose) | |||
printf("Setting cyc_per_tick to %f\n", cyc_per_tick); | |||
} | |||
static clock_t start_tick = 0; | |||
void start_comp_counter() | |||
{ | |||
struct tms t; | |||
if (cyc_per_tick == 0.0) | |||
callibrate(0); | |||
times(&t); | |||
start_tick = t.tms_utime; | |||
start_counter(); | |||
} | |||
double get_comp_counter() | |||
{ | |||
double time = get_counter(); | |||
double ctime; | |||
struct tms t; | |||
clock_t ticks; | |||
times(&t); | |||
ticks = t.tms_utime - start_tick; | |||
ctime = time - ticks*cyc_per_tick; | |||
/* | |||
printf("Measured %.0f cycles. Ticks = %d. Corrected %.0f cycles\n", | |||
time, (int) ticks, ctime); | |||
*/ | |||
return ctime; | |||
} | |||
@ -0,0 +1,22 @@ | |||
/* Routines for using cycle counter */ | |||
/* Start the counter */ | |||
void start_counter(); | |||
/* Get # cycles since counter started */ | |||
double get_counter(); | |||
/* Measure overhead for counter */ | |||
double ovhd(); | |||
/* Determine clock rate of processor (using a default sleeptime) */ | |||
double mhz(int verbose); | |||
/* Determine clock rate of processor, having more control over accuracy */ | |||
double mhz_full(int verbose, int sleeptime); | |||
/** Special counters that compensate for timer interrupt overhead */ | |||
void start_comp_counter(); | |||
double get_comp_counter(); |
@ -0,0 +1,72 @@ | |||
#ifndef __CONFIG_H_ | |||
#define __CONFIG_H_ | |||
/* | |||
* config.h - malloc lab configuration file | |||
* | |||
* Copyright (c) 2002, R. Bryant and D. O'Hallaron, All rights reserved. | |||
* May not be used, modified, or copied without permission. | |||
*/ | |||
/* | |||
* This is the default path where the driver will look for the | |||
* default tracefiles. You can override it at runtime with the -t flag. | |||
*/ | |||
#define TRACEDIR "/afs/cs/project/ics2/im/labs/malloclab/traces/" | |||
/* | |||
* This is the list of default tracefiles in TRACEDIR that the driver | |||
* will use for testing. Modify this if you want to add or delete | |||
* traces from the driver's test suite. For example, if you don't want | |||
* your students to implement realloc, you can delete the last two | |||
* traces. | |||
*/ | |||
#define DEFAULT_TRACEFILES \ | |||
"amptjp-bal.rep",\ | |||
"cccp-bal.rep",\ | |||
"cp-decl-bal.rep",\ | |||
"expr-bal.rep",\ | |||
"coalescing-bal.rep",\ | |||
"random-bal.rep",\ | |||
"random2-bal.rep",\ | |||
"binary-bal.rep",\ | |||
"binary2-bal.rep",\ | |||
"realloc-bal.rep",\ | |||
"realloc2-bal.rep" | |||
/* | |||
* This constant gives the estimated performance of the libc malloc | |||
* package using our traces on some reference system, typically the | |||
* same kind of system the students use. Its purpose is to cap the | |||
* contribution of throughput to the performance index. Once the | |||
* students surpass the AVG_LIBC_THRUPUT, they get no further benefit | |||
* to their score. This deters students from building extremely fast, | |||
* but extremely stupid malloc packages. | |||
*/ | |||
#define AVG_LIBC_THRUPUT 600E3 /* 600 Kops/sec */ | |||
/* | |||
* This constant determines the contributions of space utilization | |||
* (UTIL_WEIGHT) and throughput (1 - UTIL_WEIGHT) to the performance | |||
* index. | |||
*/ | |||
#define UTIL_WEIGHT .60 | |||
/* | |||
* Alignment requirement in bytes (either 4 or 8) | |||
*/ | |||
#define ALIGNMENT 8 | |||
/* | |||
* Maximum heap size in bytes | |||
*/ | |||
#define MAX_HEAP (20*(1<<20)) /* 20 MB */ | |||
/***************************************************************************** | |||
* Set exactly one of these USE_xxx constants to "1" to select a timing method | |||
*****************************************************************************/ | |||
#define USE_FCYC 0 /* cycle counter w/K-best scheme (x86 & Alpha only) */ | |||
#define USE_ITIMER 0 /* interval timer (any Unix box) */ | |||
#define USE_GETTOD 1 /* gettimeofday (any Unix box) */ | |||
#endif /* __CONFIG_H */ |
@ -0,0 +1,251 @@ | |||
/* | |||
* fcyc.c - Estimate the time (in CPU cycles) used by a function f | |||
* | |||
* Copyright (c) 2002, R. Bryant and D. O'Hallaron, All rights reserved. | |||
* May not be used, modified, or copied without permission. | |||
* | |||
* Uses the cycle timer routines in clock.c to estimate the | |||
* the time in CPU cycles for a function f. | |||
*/ | |||
#include <stdlib.h> | |||
#include <sys/times.h> | |||
#include <stdio.h> | |||
#include "fcyc.h" | |||
#include "clock.h" | |||
/* Default values */ | |||
#define K 3 /* Value of K in K-best scheme */ | |||
#define MAXSAMPLES 20 /* Give up after MAXSAMPLES */ | |||
#define EPSILON 0.01 /* K samples should be EPSILON of each other*/ | |||
#define COMPENSATE 0 /* 1-> try to compensate for clock ticks */ | |||
#define CLEAR_CACHE 0 /* Clear cache before running test function */ | |||
#define CACHE_BYTES (1<<19) /* Max cache size in bytes */ | |||
#define CACHE_BLOCK 32 /* Cache block size in bytes */ | |||
static int kbest = K; | |||
static int maxsamples = MAXSAMPLES; | |||
static double epsilon = EPSILON; | |||
static int compensate = COMPENSATE; | |||
static int clear_cache = CLEAR_CACHE; | |||
static int cache_bytes = CACHE_BYTES; | |||
static int cache_block = CACHE_BLOCK; | |||
static int *cache_buf = NULL; | |||
static double *values = NULL; | |||
static int samplecount = 0; | |||
/* for debugging only */ | |||
#define KEEP_VALS 0 | |||
#define KEEP_SAMPLES 0 | |||
#if KEEP_SAMPLES | |||
static double *samples = NULL; | |||
#endif | |||
/* | |||
* init_sampler - Start new sampling process | |||
*/ | |||
static void init_sampler() | |||
{ | |||
if (values) | |||
free(values); | |||
values = calloc(kbest, sizeof(double)); | |||
#if KEEP_SAMPLES | |||
if (samples) | |||
free(samples); | |||
/* Allocate extra for wraparound analysis */ | |||
samples = calloc(maxsamples+kbest, sizeof(double)); | |||
#endif | |||
samplecount = 0; | |||
} | |||
/* | |||
* add_sample - Add new sample | |||
*/ | |||
static void add_sample(double val) | |||
{ | |||
int pos = 0; | |||
if (samplecount < kbest) { | |||
pos = samplecount; | |||
values[pos] = val; | |||
} else if (val < values[kbest-1]) { | |||
pos = kbest-1; | |||
values[pos] = val; | |||
} | |||
#if KEEP_SAMPLES | |||
samples[samplecount] = val; | |||
#endif | |||
samplecount++; | |||
/* Insertion sort */ | |||
while (pos > 0 && values[pos-1] > values[pos]) { | |||
double temp = values[pos-1]; | |||
values[pos-1] = values[pos]; | |||
values[pos] = temp; | |||
pos--; | |||
} | |||
} | |||
/* | |||
* has_converged- Have kbest minimum measurements converged within epsilon? | |||
*/ | |||
static int has_converged() | |||
{ | |||
return | |||
(samplecount >= kbest) && | |||
((1 + epsilon)*values[0] >= values[kbest-1]); | |||
} | |||
/* | |||
* clear - Code to clear cache | |||
*/ | |||
static volatile int sink = 0; | |||
static void clear() | |||
{ | |||
int x = sink; | |||
int *cptr, *cend; | |||
int incr = cache_block/sizeof(int); | |||
if (!cache_buf) { | |||
cache_buf = malloc(cache_bytes); | |||
if (!cache_buf) { | |||
fprintf(stderr, "Fatal error. Malloc returned null when trying to clear cache\n"); | |||
exit(1); | |||
} | |||
} | |||
cptr = (int *) cache_buf; | |||
cend = cptr + cache_bytes/sizeof(int); | |||
while (cptr < cend) { | |||
x += *cptr; | |||
cptr += incr; | |||
} | |||
sink = x; | |||
} | |||
/* | |||
* fcyc - Use K-best scheme to estimate the running time of function f | |||
*/ | |||
double fcyc(test_funct f, void *argp) | |||
{ | |||
double result; | |||
init_sampler(); | |||
if (compensate) { | |||
do { | |||
double cyc; | |||
if (clear_cache) | |||
clear(); | |||
start_comp_counter(); | |||
f(argp); | |||
cyc = get_comp_counter(); | |||
add_sample(cyc); | |||
} while (!has_converged() && samplecount < maxsamples); | |||
} else { | |||
do { | |||
double cyc; | |||
if (clear_cache) | |||
clear(); | |||
start_counter(); | |||
f(argp); | |||
cyc = get_counter(); | |||
add_sample(cyc); | |||
} while (!has_converged() && samplecount < maxsamples); | |||
} | |||
#ifdef DEBUG | |||
{ | |||
int i; | |||
printf(" %d smallest values: [", kbest); | |||
for (i = 0; i < kbest; i++) | |||
printf("%.0f%s", values[i], i==kbest-1 ? "]\n" : ", "); | |||
} | |||
#endif | |||
result = values[0]; | |||
#if !KEEP_VALS | |||
free(values); | |||
values = NULL; | |||
#endif | |||
return result; | |||
} | |||
/************************************************************* | |||
* Set the various parameters used by the measurement routines | |||
************************************************************/ | |||
/* | |||
* set_fcyc_clear_cache - When set, will run code to clear cache | |||
* before each measurement. | |||
* Default = 0 | |||
*/ | |||
void set_fcyc_clear_cache(int clear) | |||
{ | |||
clear_cache = clear; | |||
} | |||
/* | |||
* set_fcyc_cache_size - Set size of cache to use when clearing cache | |||
* Default = 1<<19 (512KB) | |||
*/ | |||
void set_fcyc_cache_size(int bytes) | |||
{ | |||
if (bytes != cache_bytes) { | |||
cache_bytes = bytes; | |||
if (cache_buf) { | |||
free(cache_buf); | |||
cache_buf = NULL; | |||
} | |||
} | |||
} | |||
/* | |||
* set_fcyc_cache_block - Set size of cache block | |||
* Default = 32 | |||
*/ | |||
void set_fcyc_cache_block(int bytes) { | |||
cache_block = bytes; | |||
} | |||
/* | |||
* set_fcyc_compensate- When set, will attempt to compensate for | |||
* timer interrupt overhead | |||
* Default = 0 | |||
*/ | |||
void set_fcyc_compensate(int compensate_arg) | |||
{ | |||
compensate = compensate_arg; | |||
} | |||
/* | |||
* set_fcyc_k - Value of K in K-best measurement scheme | |||
* Default = 3 | |||
*/ | |||
void set_fcyc_k(int k) | |||
{ | |||
kbest = k; | |||
} | |||
/* | |||
* set_fcyc_maxsamples - Maximum number of samples attempting to find | |||
* K-best within some tolerance. | |||
* When exceeded, just return best sample found. | |||
* Default = 20 | |||
*/ | |||
void set_fcyc_maxsamples(int maxsamples_arg) | |||
{ | |||
maxsamples = maxsamples_arg; | |||
} | |||
/* | |||
* set_fcyc_epsilon - Tolerance required for K-best | |||
* Default = 0.01 | |||
*/ | |||
void set_fcyc_epsilon(double epsilon_arg) | |||
{ | |||
epsilon = epsilon_arg; | |||
} | |||
@ -0,0 +1,68 @@ | |||
/* | |||
* fcyc.h - prototypes for the routines in fcyc.c that estimate the | |||
* time in CPU cycles used by a test function f | |||
* | |||
* Copyright (c) 2002, R. Bryant and D. O'Hallaron, All rights reserved. | |||
* May not be used, modified, or copied without permission. | |||
* | |||
*/ | |||
/* The test function takes a generic pointer as input */ | |||
typedef void (*test_funct)(void *); | |||
/* Compute number of cycles used by test function f */ | |||
double fcyc(test_funct f, void* argp); | |||
/********************************************************* | |||
* Set the various parameters used by measurement routines | |||
*********************************************************/ | |||
/* | |||
* set_fcyc_clear_cache - When set, will run code to clear cache | |||
* before each measurement. | |||
* Default = 0 | |||
*/ | |||
void set_fcyc_clear_cache(int clear); | |||
/* | |||
* set_fcyc_cache_size - Set size of cache to use when clearing cache | |||
* Default = 1<<19 (512KB) | |||
*/ | |||
void set_fcyc_cache_size(int bytes); | |||
/* | |||
* set_fcyc_cache_block - Set size of cache block | |||
* Default = 32 | |||
*/ | |||
void set_fcyc_cache_block(int bytes); | |||
/* | |||
* set_fcyc_compensate- When set, will attempt to compensate for | |||
* timer interrupt overhead | |||
* Default = 0 | |||
*/ | |||
void set_fcyc_compensate(int compensate_arg); | |||
/* | |||
* set_fcyc_k - Value of K in K-best measurement scheme | |||
* Default = 3 | |||
*/ | |||
void set_fcyc_k(int k); | |||
/* | |||
* set_fcyc_maxsamples - Maximum number of samples attempting to find | |||
* K-best within some tolerance. | |||
* When exceeded, just return best sample found. | |||
* Default = 20 | |||
*/ | |||
void set_fcyc_maxsamples(int maxsamples_arg); | |||
/* | |||
* set_fcyc_epsilon - Tolerance required for K-best | |||
* Default = 0.01 | |||
*/ | |||
void set_fcyc_epsilon(double epsilon_arg); | |||
@ -0,0 +1,57 @@ | |||
/**************************** | |||
* High-level timing wrappers | |||
****************************/ | |||
#include <stdio.h> | |||
#include "fsecs.h" | |||
#include "fcyc.h" | |||
#include "clock.h" | |||
#include "ftimer.h" | |||
#include "config.h" | |||
static double Mhz; /* estimated CPU clock frequency */ | |||
extern int verbose; /* -v option in mdriver.c */ | |||
/* | |||
* init_fsecs - initialize the timing package | |||
*/ | |||
void init_fsecs(void) | |||
{ | |||
Mhz = 0; /* keep gcc -Wall happy */ | |||
#if USE_FCYC | |||
if (verbose) | |||
printf("Measuring performance with a cycle counter.\n"); | |||
/* set key parameters for the fcyc package */ | |||
set_fcyc_maxsamples(20); | |||
set_fcyc_clear_cache(1); | |||
set_fcyc_compensate(1); | |||
set_fcyc_epsilon(0.01); | |||
set_fcyc_k(3); | |||
Mhz = mhz(verbose > 0); | |||
#elif USE_ITIMER | |||
if (verbose) | |||
printf("Measuring performance with the interval timer.\n"); | |||
#elif USE_GETTOD | |||
if (verbose) | |||
printf("Measuring performance with gettimeofday().\n"); | |||
#endif | |||
} | |||
/* | |||
* fsecs - Return the running time of a function f (in seconds) | |||
*/ | |||
double fsecs(fsecs_test_funct f, void *argp) | |||
{ | |||
#if USE_FCYC | |||
double cycles = fcyc(f, argp); | |||
return cycles/(Mhz*1e6); | |||
#elif USE_ITIMER | |||
return ftimer_itimer(f, argp, 10); | |||
#elif USE_GETTOD | |||
return ftimer_gettod(f, argp, 10); | |||
#endif | |||
} | |||
@ -0,0 +1,4 @@ | |||
typedef void (*fsecs_test_funct)(void *); | |||
void init_fsecs(void); | |||
double fsecs(fsecs_test_funct f, void *argp); |
@ -0,0 +1,106 @@ | |||
/* | |||
* ftimer.c - Estimate the time (in seconds) used by a function f | |||
* | |||
* Copyright (c) 2002, R. Bryant and D. O'Hallaron, All rights reserved. | |||
* May not be used, modified, or copied without permission. | |||
* | |||
* Function timers that estimate the running time (in seconds) of a function f. | |||
* ftimer_itimer: version that uses the interval timer | |||
* ftimer_gettod: version that uses gettimeofday | |||
*/ | |||
#include <stdio.h> | |||
#include <sys/time.h> | |||
#include "ftimer.h" | |||
/* function prototypes */ | |||
static void init_etime(void); | |||
static double get_etime(void); | |||
/* | |||
* ftimer_itimer - Use the interval timer to estimate the running time | |||
* of f(argp). Return the average of n runs. | |||
*/ | |||
double ftimer_itimer(ftimer_test_funct f, void *argp, int n) | |||
{ | |||
double start, tmeas; | |||
int i; | |||
init_etime(); | |||
start = get_etime(); | |||
for (i = 0; i < n; i++) | |||
f(argp); | |||
tmeas = get_etime() - start; | |||
return tmeas / n; | |||
} | |||
/* | |||
* ftimer_gettod - Use gettimeofday to estimate the running time of | |||
* f(argp). Return the average of n runs. | |||
*/ | |||
double ftimer_gettod(ftimer_test_funct f, void *argp, int n) | |||
{ | |||
int i; | |||
struct timeval stv, etv; | |||
double diff; | |||
gettimeofday(&stv, NULL); | |||
for (i = 0; i < n; i++) | |||
f(argp); | |||
gettimeofday(&etv,NULL); | |||
diff = 1E3*(etv.tv_sec - stv.tv_sec) + 1E-3*(etv.tv_usec-stv.tv_usec); | |||
diff /= n; | |||
return (1E-3*diff); | |||
} | |||
/* | |||
* Routines for manipulating the Unix interval timer | |||
*/ | |||
/* The initial value of the interval timer */ | |||
#define MAX_ETIME 86400 | |||
/* static variables that hold the initial value of the interval timer */ | |||
static struct itimerval first_u; /* user time */ | |||
static struct itimerval first_r; /* real time */ | |||
static struct itimerval first_p; /* prof time*/ | |||
/* init the timer */ | |||
static void init_etime(void) | |||
{ | |||
first_u.it_interval.tv_sec = 0; | |||
first_u.it_interval.tv_usec = 0; | |||
first_u.it_value.tv_sec = MAX_ETIME; | |||
first_u.it_value.tv_usec = 0; | |||
setitimer(ITIMER_VIRTUAL, &first_u, NULL); | |||
first_r.it_interval.tv_sec = 0; | |||
first_r.it_interval.tv_usec = 0; | |||
first_r.it_value.tv_sec = MAX_ETIME; | |||
first_r.it_value.tv_usec = 0; | |||
setitimer(ITIMER_REAL, &first_r, NULL); | |||
first_p.it_interval.tv_sec = 0; | |||
first_p.it_interval.tv_usec = 0; | |||
first_p.it_value.tv_sec = MAX_ETIME; | |||
first_p.it_value.tv_usec = 0; | |||
setitimer(ITIMER_PROF, &first_p, NULL); | |||
} | |||
/* return elapsed real seconds since call to init_etime */ | |||
static double get_etime(void) { | |||
struct itimerval v_curr; | |||
struct itimerval r_curr; | |||
struct itimerval p_curr; | |||
getitimer(ITIMER_VIRTUAL, &v_curr); | |||
getitimer(ITIMER_REAL,&r_curr); | |||
getitimer(ITIMER_PROF,&p_curr); | |||
return (double) ((first_p.it_value.tv_sec - r_curr.it_value.tv_sec) + | |||
(first_p.it_value.tv_usec - r_curr.it_value.tv_usec)*1e-6); | |||
} | |||
@ -0,0 +1,14 @@ | |||
/* | |||
* Function timers | |||
*/ | |||
typedef void (*ftimer_test_funct)(void *); | |||
/* Estimate the running time of f(argp) using the Unix interval timer. | |||
Return the average of n runs */ | |||
double ftimer_itimer(ftimer_test_funct f, void *argp, int n); | |||
/* Estimate the running time of f(argp) using gettimeofday | |||
Return the average of n runs */ | |||
double ftimer_gettod(ftimer_test_funct f, void *argp, int n); | |||
@ -0,0 +1,101 @@ | |||
/* | |||
* memlib.c - a module that simulates the memory system. Needed because it | |||
* allows us to interleave calls from the student's malloc package | |||
* with the system's malloc package in libc. | |||
*/ | |||
#include <stdio.h> | |||
#include <stdlib.h> | |||
#include <assert.h> | |||
#include <unistd.h> | |||
#include <sys/mman.h> | |||
#include <string.h> | |||
#include <errno.h> | |||
#include "memlib.h" | |||
#include "config.h" | |||
/* private variables */ | |||
static char *mem_start_brk; /* points to first byte of heap */ | |||
static char *mem_brk; /* points to last byte of heap */ | |||
static char *mem_max_addr; /* largest legal heap address */ | |||
/* | |||
* mem_init - initialize the memory system model | |||
*/ | |||
void mem_init(void) | |||
{ | |||
/* allocate the storage we will use to model the available VM */ | |||
if ((mem_start_brk = (char *)malloc(MAX_HEAP)) == NULL) { | |||
fprintf(stderr, "mem_init_vm: malloc error\n"); | |||
exit(1); | |||
} | |||
mem_max_addr = mem_start_brk + MAX_HEAP; /* max legal heap address */ | |||
mem_brk = mem_start_brk; /* heap is empty initially */ | |||
} | |||
/* | |||
* mem_deinit - free the storage used by the memory system model | |||
*/ | |||
void mem_deinit(void) | |||
{ | |||
free(mem_start_brk); | |||
} | |||
/* | |||
* mem_reset_brk - reset the simulated brk pointer to make an empty heap | |||
*/ | |||
void mem_reset_brk() | |||
{ | |||
mem_brk = mem_start_brk; | |||
} | |||
/* | |||
* mem_sbrk - simple model of the sbrk function. Extends the heap | |||
* by incr bytes and returns the start address of the new area. In | |||
* this model, the heap cannot be shrunk. | |||
*/ | |||
void *mem_sbrk(int incr) | |||
{ | |||
char *old_brk = mem_brk; | |||
if ( (incr < 0) || ((mem_brk + incr) > mem_max_addr)) { | |||
errno = ENOMEM; | |||
fprintf(stderr, "ERROR: mem_sbrk failed. Ran out of memory...\n"); | |||
return (void *)-1; | |||
} | |||
mem_brk += incr; | |||
return (void *)old_brk; | |||
} | |||
/* | |||
* mem_heap_lo - return address of the first heap byte | |||
*/ | |||
void *mem_heap_lo() | |||
{ | |||
return (void *)mem_start_brk; | |||
} | |||
/* | |||
* mem_heap_hi - return address of last heap byte | |||
*/ | |||
void *mem_heap_hi() | |||
{ | |||
return (void *)(mem_brk - 1); | |||
} | |||
/* | |||
* mem_heapsize() - returns the heap size in bytes | |||
*/ | |||
size_t mem_heapsize() | |||
{ | |||
return (size_t)(mem_brk - mem_start_brk); | |||
} | |||
/* | |||
* mem_pagesize() - returns the page size of the system | |||
*/ | |||
size_t mem_pagesize() | |||
{ | |||
return (size_t)getpagesize(); | |||
} |
@ -0,0 +1,11 @@ | |||
#include <unistd.h> | |||
void mem_init(void); | |||
void mem_deinit(void); | |||
void *mem_sbrk(int incr); | |||
void mem_reset_brk(void); | |||
void *mem_heap_lo(void); | |||
void *mem_heap_hi(void); | |||
size_t mem_heapsize(void); | |||
size_t mem_pagesize(void); | |||
@ -0,0 +1,336 @@ | |||
/* | |||
* mm-naive.c - The fastest, least memory-efficient malloc package. | |||
* | |||
* In this naive approach, a block is allocated by simply incrementing | |||
* the brk pointer. A block is pure payload. There are no headers or | |||
* footers. Blocks are never coalesced or reused. Realloc is | |||
* implemented directly using mm_malloc and mm_free. | |||
* | |||
* NOTE TO STUDENTS: Replace this header comment with your own header | |||
* comment that gives a high level description of your solution. | |||
*/ | |||
#include <stdio.h> | |||
#include <stdlib.h> | |||
#include <assert.h> | |||
#include <unistd.h> | |||
#include <string.h> | |||
#include "mm.h" | |||
#include "memlib.h" | |||
/********************************************************* | |||
* NOTE TO STUDENTS: Before you do anything else, please | |||
* provide your team information in the following struct. | |||
********************************************************/ | |||
team_t team = { | |||
/* Team name */ | |||
"team", | |||
/* First member's full name */ | |||
"GentleCold", | |||
/* First member's email address */ | |||
"1952173800@qq.com", | |||
/* Second member's full name (leave blank if none) */ | |||
"", | |||
/* Second member's email address (leave blank if none) */ | |||
"" | |||
}; | |||
/* single word (4) or double word (8) alignment */ | |||
#define ALIGNMENT 8 | |||
/* rounds up to the nearest multiple of ALIGNMENT */ | |||
#define ALIGN(size) (((size) + (ALIGNMENT - 1)) & ~0x7) | |||
// My define | |||
#define WSIZE 4 | |||
#define DSIZE 8 | |||
#define FSIZE 16 | |||
#define CHUNK 1 << 10 | |||
#define MAX(a, b) ((a) > (b) ? (a) : (b)) | |||
#define PARSE(v) ((v) & ~0x7) | |||
#define PACK(v, a) ((v) | (a)) | |||
#define HEAD(bp) ((byte *)(bp) - WSIZE) | |||
#define FOOT(bp) ((byte *)(bp) + SIZE(bp)) | |||
#define SIZE(bp) (PARSE(GET(HEAD(bp)))) | |||
#define ALLOC(bp) (GET(HEAD(bp)) & 0x1) | |||
#define GET(p) (*(word *)(p)) | |||
#define SET(p, v) (*(word *)(p) = (v)) | |||
#define NEXT(bp) (FOOT(bp) + DSIZE) | |||
#define PREV(bp) ((byte *)(bp) - PARSE(GET((bp) - DSIZE)) - DSIZE) | |||
typedef unsigned int word; | |||
typedef char byte; | |||
// mark the front and tail pos | |||
void *front_p = NULL; | |||
void *tail_p = NULL; | |||
// used for next fit, updated by mm_init, mm_malloc, _coalesce | |||
void *fitted_p = NULL; | |||
// My func | |||
/** | |||
* add a blank chunk and coalesce | |||
* will update tail_p | |||
* @param size align by 8, excluding head and foot | |||
* @return new bp | |||
*/ | |||
static void *_extend(size_t size); | |||
/** | |||
* coalesce blank chunk before and after bp | |||
* @param bp loaded point | |||
* @return bp after coalesce | |||
*/ | |||
static void *_coalesce(void *bp); | |||
static void *__coalesce_prev(void *bp); | |||
static void *__coalesce_next(void *bp); | |||
static void *__coalesce_all(void *bp); | |||
/** | |||
* traverse and find first fit, then place in | |||
* @deprecated too slow | |||
* @param size align by 8, excluding head and foot | |||
* @return | |||
*/ | |||
static void *_first_fit(size_t size); | |||
/** | |||
* find next fit, then place in | |||
* @param size align by 8, excluding head and foot | |||
* @return | |||
*/ | |||
static void *_next_fit(size_t size); | |||
/** | |||
* find next fit, then place in, if from beginning, use best fit | |||
* @deprecated thru loss | |||
* @param size align by 8, excluding head and foot | |||
* @return | |||
*/ | |||
static void *_next_best_fit(size_t size); | |||
/** | |||
* allocate the block and cut sometimes | |||
* @param size align by 8, excluding head and foot | |||
*/ | |||
static void _place(void *ptr, size_t size); | |||
// end | |||
/** | |||
* initialize the malloc package. | |||
* get a new chunk, set front_p and tail_p | |||
*/ | |||
int mm_init(void) { | |||
if ((front_p = mem_sbrk(WSIZE)) == (void *) - 1) return -1; // blank | |||
front_p += DSIZE; // first chunk | |||
fitted_p = front_p; // init fitted_p | |||
if (!_extend(CHUNK)) return -1; | |||
return 0; | |||
} | |||
/** | |||
* find first fit or extend | |||
*/ | |||
void *mm_malloc(size_t size) { | |||
size_t adjust_size = ALIGN(size); | |||
size_t extend_size; | |||
void *bp; | |||
if ((bp = _next_fit(adjust_size)) != NULL) { | |||
fitted_p = bp; | |||
return bp; | |||
} else { | |||
extend_size = adjust_size; | |||
if (!ALLOC(tail_p)) { | |||
extend_size -= (SIZE(tail_p) + DSIZE); | |||
} | |||
bp = _extend(MAX(extend_size, CHUNK)); | |||
if (bp == NULL) return bp; | |||
_place(bp, adjust_size); | |||
fitted_p = bp; | |||
return bp; | |||
} | |||
} | |||
/** | |||
* free a block and coalesce immediately | |||
*/ | |||
void mm_free(void *ptr) { | |||
size_t size = SIZE(ptr); | |||
SET(HEAD(ptr), PACK(size, 0)); | |||
SET(FOOT(ptr), PACK(size, 0)); | |||
_coalesce(ptr); | |||
} | |||
/** | |||
* implemented simply in terms of mm_malloc and mm_free | |||
* compare adjust_size and total_size step by step | |||
*/ | |||
void *mm_realloc(void *ptr, size_t size) { | |||
if (ptr == NULL) return mm_malloc(size); | |||
if (size == 0) return NULL; | |||
void *new_ptr; | |||
size_t adjust_size = ALIGN(size); | |||
size_t old_size = SIZE(ptr); | |||
if (adjust_size <= old_size) { | |||
// just return, for the memory lost is little | |||
return ptr; | |||
} | |||
size_t next_size = (ptr != tail_p && !ALLOC(NEXT(ptr))) ? SIZE(NEXT(ptr)) + DSIZE : 0; | |||
size_t total_size = old_size + next_size; | |||
if (adjust_size <= total_size) { | |||
__coalesce_next(ptr); | |||
_place(ptr, adjust_size); // just cut | |||
return ptr; | |||
} | |||
size_t prev_size = (ptr != front_p && !ALLOC(PREV(ptr))) ? SIZE(PREV(ptr)) + DSIZE : 0; | |||
total_size += prev_size; | |||
if (adjust_size <= total_size) { // coalesce prev or all | |||
new_ptr = _coalesce(ptr); | |||
memmove(new_ptr, ptr, old_size); | |||
_place(new_ptr, adjust_size); | |||
} else { | |||
if ((new_ptr = mm_malloc(size)) == NULL) return NULL; | |||
memmove(new_ptr, ptr, old_size); | |||
mm_free(ptr); | |||
} | |||
return new_ptr; | |||
} | |||
// my func | |||
static void *_extend(size_t size) { | |||
void *bp; | |||
if ((bp = mem_sbrk(size + DSIZE)) == (void *) - 1) return NULL; | |||
// init chunk | |||
SET(bp, PACK(size, 0)); | |||
bp += WSIZE; | |||
SET(FOOT(bp), PACK(size, 0)); | |||
// init mark point | |||
tail_p = bp; | |||
return _coalesce(bp); | |||
} | |||
static void *_coalesce(void *bp) { | |||
// one chunk | |||
if (bp == front_p && bp == tail_p) return bp; | |||
if (bp == front_p || ALLOC(PREV(bp))) { | |||
if (bp == tail_p || ALLOC(NEXT(bp))) return bp; | |||
return __coalesce_next(bp); | |||
} else if (bp == tail_p || ALLOC(NEXT(bp))) { | |||
return __coalesce_prev(bp); | |||
} else { | |||
return __coalesce_all(bp); | |||
} | |||
} | |||
static void *__coalesce_prev(void *bp) { | |||
void *prev = PREV(bp); | |||
size_t new_size = SIZE(prev) + SIZE(bp) + DSIZE; | |||
SET(HEAD(prev), PACK(new_size, 0)); | |||
SET(FOOT(bp), PACK(new_size, 0)); | |||
if (bp == tail_p) tail_p = prev; | |||
if (bp == fitted_p) fitted_p = prev; | |||
return prev; | |||
} | |||
static void *__coalesce_next(void *bp) { | |||
void *next = NEXT(bp); | |||
size_t new_size = SIZE(next) + SIZE(bp) + DSIZE; | |||
SET(HEAD(bp), PACK(new_size, 0)); | |||
SET(FOOT(next), PACK(new_size, 0)); | |||
if (next == tail_p) tail_p = bp; // should also change | |||
if (next == fitted_p) fitted_p = bp; | |||
return bp; | |||
} | |||
static void *__coalesce_all(void *bp) { | |||
void *prev = PREV(bp); | |||
void *next = NEXT(bp); | |||
size_t new_size = SIZE(prev) + SIZE(bp) + SIZE(next) + FSIZE; | |||
SET(HEAD(prev), PACK(new_size, 0)); | |||
SET(FOOT(next), PACK(new_size, 0)); | |||
if (next == tail_p) tail_p = prev; | |||
if (next == fitted_p || bp == fitted_p) fitted_p = prev; | |||
return prev; | |||
} | |||
static void *_first_fit(size_t size) { | |||
void *bp = front_p; | |||
void *after_p = NEXT(tail_p); | |||
while (bp != after_p) { | |||
if (!ALLOC(bp) && SIZE(bp) >= size) { | |||
_place(bp, size); | |||
return bp; | |||
} | |||
bp = NEXT(bp); | |||
} | |||
return NULL; | |||
} | |||
static void *_next_fit(size_t size) { | |||
void *bp = fitted_p; | |||
void *after_p = NEXT(tail_p); | |||
while (bp != after_p) { | |||
if (!ALLOC(bp) && SIZE(bp) >= size) { | |||
_place(bp, size); | |||
return bp; | |||
} | |||
bp = NEXT(bp); | |||
} | |||
bp = front_p; | |||
while (bp != fitted_p) { | |||
if (!ALLOC(bp) && SIZE(bp) >= size) { | |||
_place(bp, size); | |||
return bp; | |||
} | |||
bp = NEXT(bp); | |||
} | |||
return NULL; | |||
} | |||
static void *_next_best_fit(size_t size) { | |||
void *bp = fitted_p; | |||
void *after_p = NEXT(tail_p); | |||
while (bp != after_p) { | |||
if (!ALLOC(bp) && SIZE(bp) >= size) { | |||
_place(bp, size); | |||
return bp; | |||
} | |||
bp = NEXT(bp); | |||
} | |||
bp = front_p; | |||
size_t min = 0; | |||
void *min_p = NULL; | |||
while (bp != fitted_p) { | |||
if (!ALLOC(bp) && SIZE(bp) >= size) { | |||
if (min_p == NULL || SIZE(bp) < min) { | |||
min = SIZE(bp); | |||
min_p = bp; | |||
} | |||
} | |||
bp = NEXT(bp); | |||
} | |||
if (min_p == NULL) return NULL; | |||
_place(min_p, size); | |||
return min_p; | |||
} | |||
static void _place(void *ptr, size_t size) { | |||
size_t p_size = SIZE(ptr); | |||
if (p_size - size >= FSIZE) { | |||
SET(HEAD(ptr), PACK(size, 1)); | |||
SET(FOOT(ptr), PACK(size, 1)); | |||
// DSIZE adjust | |||
size_t adjust_size = p_size - size - DSIZE; | |||
SET(HEAD(NEXT(ptr)), PACK(adjust_size, 0)); | |||
SET(FOOT(NEXT(ptr)), PACK(adjust_size, 0)); | |||
if (ptr == tail_p) tail_p = NEXT(ptr); | |||
} else { | |||
SET(HEAD(ptr), PACK(p_size, 1)); | |||
SET(FOOT(ptr), PACK(p_size, 1)); | |||
} | |||
} |
@ -0,0 +1,23 @@ | |||
#include <stdio.h> | |||
extern int mm_init (void); | |||
extern void *mm_malloc (size_t size); | |||
extern void mm_free (void *ptr); | |||
extern void *mm_realloc(void *ptr, size_t size); | |||
/* | |||
* Students work in teams of one or two. Teams enter their team name, | |||
* personal names and login IDs in a struct of this | |||
* type in their bits.c file. | |||
*/ | |||
typedef struct { | |||
char *teamname; /* ID1+ID2 or ID1 */ | |||
char *name1; /* full name of first member */ | |||
char *id1; /* login ID of first member */ | |||
char *name2; /* full name of second member (if any) */ | |||
char *id2; /* login ID of second member */ | |||
} team_t; | |||
extern team_t team; | |||
@ -0,0 +1,16 @@ | |||
20000 | |||
6 | |||
12 | |||
1 | |||
a 0 2040 | |||
a 1 2040 | |||
f 1 | |||
a 2 48 | |||
a 3 4072 | |||
f 3 | |||
a 4 4072 | |||
f 0 | |||
f 2 | |||
a 5 4072 | |||
f 4 | |||
f 5 |
@ -0,0 +1,16 @@ | |||
20000 | |||
6 | |||
12 | |||
1 | |||
a 0 2040 | |||
a 1 4010 | |||
a 2 48 | |||
a 3 4072 | |||
a 4 4072 | |||
a 5 4072 | |||
f 0 | |||
f 1 | |||
f 2 | |||
f 3 | |||
f 4 | |||
f 5 |
@ -0,0 +1,114 @@ | |||
## 关于示例代码的理解 | |||
#### 环境 | |||
* 从 `mem_init` 中可以看出整个分配器是在一大段已经分配好的连续的内存中进行的 | |||
* 同样每次 `mem_sbrk` 进行扩展时也是连续扩展,模拟堆的向上增长 | |||
#### 对齐 | |||
* 标准 `malloc` 也是八字节对齐,所以可以满足强制八字节对齐的要求 | |||
* 返回的内存应为有效载荷部分,如果每个块加上头部和脚部,为了满足对齐要求,须在开头空出四字节的位置 | |||
* 也意味着空闲块最小为八字节大小,有效载荷也应八字节对齐,已分配块最小十六字节 | |||
#### 测试 | |||
* handout 里的测试用例偏少,从网上找到了更为详尽的 `traces` 测试用例来测试 | |||
## version 1 | |||
#### 规则与注意 | |||
理解逻辑后,接下来将实现自己的版本 | |||
* 为方便理解,定义了两个类型别名 | |||
```c | |||
typedef unsigned int word; | |||
typedef char byte; | |||
``` | |||
* 不同于示例代码用序言块和尾块标记,本人仅用两个指针标记头尾,来提高内存利用率 | |||
```c | |||
// mark the front and tail pos | |||
void *front_p = NULL; | |||
void *tail_p = NULL; | |||
``` | |||
* 但会增加代码的复杂度,需着重维护 | |||
* 同时还应注意若 `bp == front_p` 则 `PREV(bp)` 内的值无效,`tail_p` 同理 | |||
* 为保持一致性,向辅助函数内传入的 `size` 均应在传入之前对齐,均不包含头尾部大小 | |||
* 仅在内部碎片大于等于十六字节时才进行切割 | |||
* 其他部分与示例大同小异 | |||
#### bug 与 debug | |||
* `#debug`对于 `segmentation fault` 使用 `gdb` 获取头尾块的 `size` 发现尾部异常值 `0xcdcdcd`,在代码中使用 `print` 跟踪 `trail_p` 变量,发现在`__coalesce_next`处没有及时更新 | |||
* `#bug1` 若记录的 `size` 是有效载荷的 `size`,合并和分割时应注意增减 `DSIZE` | |||
* `#bug2` 每次合并都需要判断 `tail_p` 是否改变,特别是 `__coalesce_next` 的情况 | |||
#### 方法与得分 | |||
* 隐式空闲链表,首次适配,立即合并 | |||
```c | |||
Results for mm malloc: | |||
trace valid util ops secs Kops | |||
0 yes 99% 5694 0.007579 751 | |||
1 yes 100% 5848 0.006639 881 | |||
2 yes 99% 6648 0.010560 630 | |||
3 yes 100% 5380 0.008016 671 | |||
4 yes 100% 14400 0.000102140762 | |||
5 yes 92% 4800 0.006677 719 | |||
6 yes 92% 4800 0.005988 802 | |||
7 yes 55% 12000 0.141468 85 | |||
8 yes 51% 24000 0.274197 88 | |||
9 yes 33% 14401 0.128358 112 | |||
10 yes 50% 14401 0.002138 6734 | |||
Total 79% 112372 0.591722 190 | |||
Perf index = 47 (util) + 13 (thru) = 60/100 | |||
``` | |||
## Version 1.1 | |||
#### 针对 `realloc` 的优化 v1 | |||
* 若 `new_size <= old_size` 则不分配而是切割 | |||
* 若下一块未分配且总和大于 `new_size` 则合并 | |||
* 若合并后内部碎片过大则仍需分割 | |||
* 提高六分,但判断过多,且考虑不全 | |||
#### 针对 `realloc` 的优化 v2 | |||
* 评估前后空闲块的总大小,若足够,则合并 | |||
* 合并不会破坏数据,合并后复制数据,再根据需要分割 | |||
* 然而针对第九项测试,合并前后空闲块反而内存利用率低于仅合并后空闲块 | |||
#### 针对 `realloc` 的优化 v3 | |||
* 最终选择的是逐步的过程,因为从时间开销上来看,直接返回优于仅合并后部分优于合并前后部分,但同时合并前后部分与再分配一段内存的优劣不好比较 | |||
* 现在的问题是在仅合并后部分和重新分配之间要不要插一段合并前后部分的条件,两者分数相同,个人认为插入这个条件通用性更好 | |||
#### 得分 | |||
```c | |||
Results for mm malloc: | |||
trace valid util ops secs Kops | |||
0 yes 99% 5694 0.007401 769 | |||
1 yes 100% 5848 0.006883 850 | |||
2 yes 99% 6648 0.011138 597 | |||
3 yes 100% 5380 0.008327 646 | |||
4 yes 100% 14400 0.000092156013 | |||
5 yes 92% 4800 0.006244 769 | |||
6 yes 92% 4800 0.005888 815 | |||
7 yes 55% 12000 0.142196 84 | |||
8 yes 51% 24000 0.277304 87 | |||
9 yes 50% 14401 0.018129 794 | |||
10 yes 86% 14401 0.000132108933 | |||
Total 84% 112372 0.483734 232 | |||
Perf index = 50 (util) + 15 (thru) = 66/100 | |||
``` | |||
## Version 1.2 | |||
#### 使用 `next fit` | |||
* 因为大块的空闲块总是趋向于在后面,所以下一次适配不用从头遍历,可以大幅提高吞吐率 | |||
* 引入新的全局变量 `fitted_p` 并且需要在多处维护:初始化,分配,合并 | |||
#### 结合 `best fit` | |||
* `next_fit` 会降低内存利用率,而 `best_fit` 会降低吞吐率,但我们可以进行一个折衷,即在 `fitted_p` 后部分首次适配,而在 `fitted_p` 前部分最佳适配 | |||
* 而最佳适配仅需找到满足要求的最小值即可 | |||
* 然而因为前面的内存碎片太多,测试下来内存利用率的确有所提高,但是吞吐率下降的却更多,所以最后还是选择了 `_next_fit` | |||
#### 得分 | |||
```c | |||
Results for mm malloc: | |||
trace valid util ops secs Kops | |||
0 yes 91% 5694 0.001803 3158 | |||
1 yes 92% 5848 0.001315 4446 | |||
2 yes 97% 6648 0.003706 1794 | |||
3 yes 97% 5380 0.003602 1494 | |||
4 yes 100% 14400 0.000085169213 | |||
5 yes 91% 4800 0.004207 1141 | |||
6 yes 90% 4800 0.003837 1251 | |||
7 yes 55% 12000 0.057487 209 | |||
8 yes 51% 24000 0.029497 814 | |||
9 yes 50% 14401 0.054370 265 | |||
10 yes 70% 14401 0.000116124684 | |||
Total 80% 112372 0.160025 702 | |||
Perf index = 48 (util) + 40 (thru) = 88/100 | |||
``` | |||
*** | |||
2022.12.29 ~ 2022.12.30 |
@ -0,0 +1,188 @@ | |||
#include <stdio.h> | |||
#include <stdlib.h> | |||
#include <unistd.h> | |||
#include <sys/times.h> | |||
#include <string.h> | |||
#include "clock.h" | |||
/* Keep track of most recent reading of cycle counter */ | |||
static unsigned cyc_hi = 0; | |||
static unsigned cyc_lo = 0; | |||
void access_counter(unsigned *hi, unsigned *lo) | |||
{ | |||
/* Get cycle counter */ | |||
asm("rdtsc; movl %%edx,%0; movl %%eax,%1" | |||
: "=r" (*hi), "=r" (*lo) | |||
: /* No input */ | |||
: "%edx", "%eax"); | |||
} | |||
void start_counter() | |||
{ | |||
access_counter(&cyc_hi, &cyc_lo); | |||
} | |||
double get_counter() | |||
{ | |||
unsigned ncyc_hi, ncyc_lo; | |||
unsigned hi, lo, borrow; | |||
double result; | |||
/* Get cycle counter */ | |||
access_counter(&ncyc_hi, &ncyc_lo); | |||
/* Do double precision subtraction */ | |||
lo = ncyc_lo - cyc_lo; | |||
borrow = lo > ncyc_lo; | |||
hi = ncyc_hi - cyc_hi - borrow; | |||
result = (double) hi * (1 << 30) * 4 + lo; | |||
if (result < 0) { | |||
fprintf(stderr, "Error: Cycle counter returning negative value: %.0f\n", result); | |||
} | |||
return result; | |||
} | |||
double ovhd() | |||
{ | |||
/* Do it twice to eliminate cache effects */ | |||
int i; | |||
double result; | |||
for (i = 0; i < 2; i++) { | |||
start_counter(); | |||
result = get_counter(); | |||
} | |||
return result; | |||
} | |||
/* Keep track of clock speed */ | |||
double cpu_ghz = 0.0; | |||
/* Get megahertz from /etc/proc */ | |||
#define MAXBUF 512 | |||
double core_mhz(int verbose) { | |||
static char buf[MAXBUF]; | |||
FILE *fp = fopen("/proc/cpuinfo", "r"); | |||
cpu_ghz = 0.0; | |||
if (!fp) { | |||
fprintf(stderr, "Can't open /proc/cpuinfo to get clock information\n"); | |||
cpu_ghz = 1.0; | |||
return cpu_ghz * 1000.0; | |||
} | |||
while (fgets(buf, MAXBUF, fp)) { | |||
if (strstr(buf, "cpu MHz")) { | |||
double cpu_mhz = 0.0; | |||
sscanf(buf, "cpu MHz\t: %lf", &cpu_mhz); | |||
cpu_ghz = cpu_mhz / 1000.0; | |||
break; | |||
} | |||
} | |||
fclose(fp); | |||
if (cpu_ghz == 0.0) { | |||
fprintf(stderr, "Can't open /proc/cpuinfo to get clock information\n"); | |||
cpu_ghz = 1.0; | |||
return cpu_ghz * 1000.0; | |||
} | |||
if (verbose) { | |||
printf("Processor Clock Rate ~= %.4f GHz (extracted from file)\n", cpu_ghz); | |||
} | |||
return cpu_ghz * 1000; | |||
} | |||
double mhz(int verbose) { | |||
double val = core_mhz(verbose); | |||
return val; | |||
} | |||
/* Determine clock rate by measuring cycles | |||
elapsed while sleeping for sleeptime seconds */ | |||
double mhz_full(int verbose, int sleeptime) | |||
{ | |||
double rate; | |||
start_counter(); | |||
sleep(sleeptime); | |||
rate = get_counter()/(1e6*sleeptime); | |||
if (verbose) | |||
printf("Processor Clock Rate ~= %.1f MHz\n", rate); | |||
return rate; | |||
} | |||
///* Version using a default sleeptime */ | |||
//double mhz(int verbose) | |||
//{ | |||
// return mhz_full(verbose, 2); | |||
//} | |||
/** Special counters that compensate for timer interrupt overhead */ | |||
static double cyc_per_tick = 0.0; | |||
#define NEVENT 100 | |||
#define THRESHOLD 1000 | |||
#define RECORDTHRESH 3000 | |||
/* Attempt to see how much time is used by timer interrupt */ | |||
static void callibrate(int verbose) | |||
{ | |||
double oldt; | |||
struct tms t; | |||
clock_t oldc; | |||
int e = 0; | |||
times(&t); | |||
oldc = t.tms_utime; | |||
start_counter(); | |||
oldt = get_counter(); | |||
while (e <NEVENT) { | |||
double newt = get_counter(); | |||
if (newt-oldt >= THRESHOLD) { | |||
clock_t newc; | |||
times(&t); | |||
newc = t.tms_utime; | |||
if (newc > oldc) { | |||
double cpt = (newt-oldt)/(newc-oldc); | |||
if ((cyc_per_tick == 0.0 || cyc_per_tick > cpt) && cpt > RECORDTHRESH) | |||
cyc_per_tick = cpt; | |||
/* | |||
if (verbose) | |||
printf("Saw event lasting %.0f cycles and %d ticks. Ratio = %f\n", | |||
newt-oldt, (int) (newc-oldc), cpt); | |||
*/ | |||
e++; | |||
oldc = newc; | |||
} | |||
oldt = newt; | |||
} | |||
} | |||
if (verbose) | |||
printf("Setting cyc_per_tick to %f\n", cyc_per_tick); | |||
} | |||
static clock_t start_tick = 0; | |||
void start_comp_counter() { | |||
struct tms t; | |||
if (cyc_per_tick == 0.0) | |||
callibrate(1); | |||
times(&t); | |||
start_tick = t.tms_utime; | |||
start_counter(); | |||
} | |||
double get_comp_counter() { | |||
double time = get_counter(); | |||
double ctime; | |||
struct tms t; | |||
clock_t ticks; | |||
times(&t); | |||
ticks = t.tms_utime - start_tick; | |||
ctime = time - ticks*cyc_per_tick; | |||
/* | |||
printf("Measured %.0f cycles. Ticks = %d. Corrected %.0f cycles\n", | |||
time, (int) ticks, ctime); | |||
*/ | |||
return ctime; | |||
} |
@ -0,0 +1,23 @@ | |||
/* Routines for using cycle counter */ | |||
/* Start the counter */ | |||
void start_counter(); | |||
/* Get # cycles since counter started */ | |||
double get_counter(); | |||
/* Measure overhead for counter */ | |||
double ovhd(); | |||
/* Determine clock rate of processor */ | |||
double mhz(int verbose); | |||
/* Determine clock rate of processor, having more control over accuracy */ | |||
double mhz_full(int verbose, int sleeptime); | |||
/** Special counters that compensate for timer interrupt overhead */ | |||
void start_comp_counter(); | |||
double get_comp_counter(); |
@ -0,0 +1,299 @@ | |||
/* Compute time used by a function f that takes two integer args */ | |||
#include <stdlib.h> | |||
#include <sys/times.h> | |||
#include <stdio.h> | |||
#include "clock.h" | |||
#include "fcyc2.h" | |||
static double *values = NULL; | |||
int samplecount = 0; | |||
#define KEEP_VALS 1 | |||
#define KEEP_SAMPLES 1 | |||
#if KEEP_SAMPLES | |||
double *samples = NULL; | |||
#endif | |||
/* Start new sampling process */ | |||
static void init_sampler(int k, int maxsamples) | |||
{ | |||
if (values) | |||
free(values); | |||
values = calloc(k, sizeof(double)); | |||
#if KEEP_SAMPLES | |||
if (samples) | |||
free(samples); | |||
/* Allocate extra for wraparound analysis */ | |||
samples = calloc(maxsamples+k, sizeof(double)); | |||
#endif | |||
samplecount = 0; | |||
} | |||
/* Add new sample. */ | |||
void add_sample(double val, int k) | |||
{ | |||
int pos = 0; | |||
if (samplecount < k) { | |||
pos = samplecount; | |||
values[pos] = val; | |||
} else if (val < values[k-1]) { | |||
pos = k-1; | |||
values[pos] = val; | |||
} | |||
#if KEEP_SAMPLES | |||
samples[samplecount] = val; | |||
#endif | |||
samplecount++; | |||
/* Insertion sort */ | |||
while (pos > 0 && values[pos-1] > values[pos]) { | |||
double temp = values[pos-1]; | |||
values[pos-1] = values[pos]; | |||
values[pos] = temp; | |||
pos--; | |||
} | |||
} | |||
/* Get current minimum */ | |||
double get_min() | |||
{ | |||
return values[0]; | |||
} | |||
/* What is relative error for kth smallest sample */ | |||
double err(int k) | |||
{ | |||
if (samplecount < k) | |||
return 1000.0; | |||
return (values[k-1] - values[0])/values[0]; | |||
} | |||
/* Have k minimum measurements converged within epsilon? */ | |||
int has_converged(int k_arg, double epsilon_arg, int maxsamples) | |||
{ | |||
if ((samplecount >= k_arg) && | |||
((1 + epsilon_arg)*values[0] >= values[k_arg-1])) | |||
return samplecount; | |||
if ((samplecount >= maxsamples)) | |||
return -1; | |||
return 0; | |||
} | |||
/* Code to clear cache */ | |||
/* Pentium III has 512K L2 cache, which is 128K ints */ | |||
#define ASIZE (1 << 17) | |||
/* Cache block size is 32 bytes */ | |||
#define STRIDE 8 | |||
static int stuff[ASIZE]; | |||
static int sink; | |||
static void clear() | |||
{ | |||
int x = sink; | |||
int i; | |||
for (i = 0; i < ASIZE; i += STRIDE) | |||
x += stuff[i]; | |||
sink = x; | |||
} | |||
double fcyc2_full(test_funct f, int param1, int param2, int clear_cache, | |||
int k, double epsilon, int maxsamples, int compensate) | |||
{ | |||
double result; | |||
init_sampler(k, maxsamples); | |||
if (compensate) { | |||
do { | |||
double cyc; | |||
if (clear_cache) | |||
clear(); | |||
f(param1, param2); /* warm cache */ | |||
start_comp_counter(); | |||
f(param1, param2); | |||
cyc = get_comp_counter(); | |||
add_sample(cyc, k); | |||
} while (!has_converged(k, epsilon, maxsamples) && samplecount < maxsamples); | |||
} else { | |||
do { | |||
double cyc; | |||
if (clear_cache) | |||
clear(); | |||
f(param1, param2); /* warm cache */ | |||
start_counter(); | |||
f(param1, param2); | |||
cyc = get_counter(); | |||
add_sample(cyc, k); | |||
} while (!has_converged(k, epsilon, maxsamples) && samplecount < maxsamples); | |||
} | |||
#ifdef DEBUG | |||
{ | |||
int i; | |||
printf(" %d smallest values: [", k); | |||
for (i = 0; i < k; i++) | |||
printf("%.0f%s", values[i], i==k-1 ? "]\n" : ", "); | |||
} | |||
#endif | |||
result = values[0]; | |||
#if !KEEP_VALS | |||
free(values); | |||
values = NULL; | |||
#endif | |||
return result; | |||
} | |||
double fcyc2(test_funct f, int param1, int param2, int clear_cache) | |||
{ | |||
return fcyc2_full(f, param1, param2, clear_cache, 3, 0.01, 500, 0); | |||
} | |||
/******************* Version that uses gettimeofday *************/ | |||
static double Mhz = 0.0; | |||
#include <sys/time.h> | |||
static struct timeval tstart; | |||
/* Record current time */ | |||
void start_counter_tod() | |||
{ | |||
if (Mhz == 0) | |||
Mhz = mhz_full(0, 10); | |||
gettimeofday(&tstart, NULL); | |||
} | |||
/* Get number of seconds since last call to start_timer */ | |||
double get_counter_tod() | |||
{ | |||
struct timeval tfinish; | |||
long sec, usec; | |||
gettimeofday(&tfinish, NULL); | |||
sec = tfinish.tv_sec - tstart.tv_sec; | |||
usec = tfinish.tv_usec - tstart.tv_usec; | |||
return (1e6 * sec + usec)*Mhz; | |||
} | |||
/** Special counters that compensate for timer interrupt overhead */ | |||
static double cyc_per_tick = 0.0; | |||
#define NEVENT 100 | |||
#define THRESHOLD 1000 | |||
#define RECORDTHRESH 3000 | |||
/* Attempt to see how much time is used by timer interrupt */ | |||
static void callibrate(int verbose) | |||
{ | |||
double oldt; | |||
struct tms t; | |||
clock_t oldc; | |||
int e = 0; | |||
times(&t); | |||
oldc = t.tms_utime; | |||
start_counter_tod(); | |||
oldt = get_counter_tod(); | |||
while (e <NEVENT) { | |||
double newt = get_counter_tod(); | |||
if (newt-oldt >= THRESHOLD) { | |||
clock_t newc; | |||
times(&t); | |||
newc = t.tms_utime; | |||
if (newc > oldc) { | |||
double cpt = (newt-oldt)/(newc-oldc); | |||
if ((cyc_per_tick == 0.0 || cyc_per_tick > cpt) && cpt > RECORDTHRESH) | |||
cyc_per_tick = cpt; | |||
/* | |||
if (verbose) | |||
printf("Saw event lasting %.0f cycles and %d ticks. Ratio = %f\n", | |||
newt-oldt, (int) (newc-oldc), cpt); | |||
*/ | |||
e++; | |||
oldc = newc; | |||
} | |||
oldt = newt; | |||
} | |||
} | |||
if (verbose) | |||
printf("Setting cyc_per_tick to %f\n", cyc_per_tick); | |||
} | |||
static clock_t start_tick = 0; | |||
void start_comp_counter_tod() { | |||
struct tms t; | |||
if (cyc_per_tick == 0.0) | |||
callibrate(0); | |||
times(&t); | |||
start_tick = t.tms_utime; | |||
start_counter_tod(); | |||
} | |||
double get_comp_counter_tod() { | |||
double time = get_counter_tod(); | |||
double ctime; | |||
struct tms t; | |||
clock_t ticks; | |||
times(&t); | |||
ticks = t.tms_utime - start_tick; | |||
ctime = time - ticks*cyc_per_tick; | |||
/* | |||
printf("Measured %.0f cycles. Ticks = %d. Corrected %.0f cycles\n", | |||
time, (int) ticks, ctime); | |||
*/ | |||
return ctime; | |||
} | |||
double fcyc2_full_tod(test_funct f, int param1, int param2, int clear_cache, | |||
int k, double epsilon, int maxsamples, int compensate) | |||
{ | |||
double result; | |||
init_sampler(k, maxsamples); | |||
if (compensate) { | |||
do { | |||
double cyc; | |||
if (clear_cache) | |||
clear(); | |||
start_comp_counter_tod(); | |||
f(param1, param2); | |||
cyc = get_comp_counter_tod(); | |||
add_sample(cyc, k); | |||
} while (!has_converged(k, epsilon, maxsamples) && samplecount < maxsamples); | |||
} else { | |||
do { | |||
double cyc; | |||
if (clear_cache) | |||
clear(); | |||
start_counter_tod(); | |||
f(param1, param2); | |||
cyc = get_counter_tod(); | |||
add_sample(cyc, k); | |||
} while (!has_converged(k, epsilon, maxsamples) && samplecount < maxsamples); | |||
} | |||
#ifdef DEBUG | |||
{ | |||
int i; | |||
printf(" %d smallest values: [", k); | |||
for (i = 0; i < k; i++) | |||
printf("%.0f%s", values[i], i==k-1 ? "]\n" : ", "); | |||
} | |||
#endif | |||
result = values[0]; | |||
#if !KEEP_VALS | |||
free(values); | |||
values = NULL; | |||
#endif | |||
return result; | |||
} | |||
double fcyc2_tod(test_funct f, int param1, int param2, int clear_cache) | |||
{ | |||
return fcyc2_full_tod(f, param1, param2, clear_cache, 3, 0.01, 20, 0); | |||
} | |||
@ -0,0 +1,41 @@ | |||
/* Find number of cycles used by function that takes 2 arguments */ | |||
/* Function to be tested takes two integer arguments */ | |||
typedef int (*test_funct)(int, int); | |||
/* Compute time used by function f */ | |||
double fcyc2(test_funct f, int param1, int param2, int clear_cache); | |||
/********* These routines are used to help with the analysis *********/ | |||
/* | |||
Parameters: | |||
k: How many samples must be within epsilon for convergence | |||
epsilon: What is tolerance | |||
maxsamples: How many samples until give up? | |||
*/ | |||
/* Full version of fcyc with control over parameters */ | |||
double fcyc2_full(test_funct f, int param1, int param2, int clear_cache, | |||
int k, double epsilon, int maxsamples, int compensate); | |||
/* Get current minimum */ | |||
double get_min(); | |||
/* What is convergence status for k minimum measurements within epsilon | |||
Returns 0 if not converged, #samples if converged, and -1 if can't | |||
reach convergence | |||
*/ | |||
int has_converged(int k, double epsilon, int maxsamples); | |||
/* What is error of current measurement */ | |||
double err(int k); | |||
/************* Try other clocking methods *****************/ | |||
/* Full version that uses the time of day clock */ | |||
double fcyc2_full_tod(test_funct f, int param1, int param2, int clear_cache, | |||
int k, double epsilon, int maxsamples, int compensate); | |||
double fcyc2_tod(test_funct f, int param1, int param2, int clear_cache); |
@ -0,0 +1,116 @@ | |||
/* mountain.c - Generate the memory mountain. */ | |||
/* $begin mountainmain */ | |||
#include <stdlib.h> | |||
#include <stdio.h> | |||
#include "fcyc2.h" /* measurement routines */ | |||
#include "clock.h" /* routines to access the cycle counter */ | |||
#define MINBYTES (1 << 14) /* First working set size */ | |||
#define MAXBYTES (1 << 27) /* Last working set size */ | |||
#define MAXSTRIDE 15 /* Stride x8 bytes */ | |||
#define MAXELEMS MAXBYTES/sizeof(long) | |||
/* $begin mountainfuns */ | |||
long data[MAXELEMS]; /* The global array we'll be traversing */ | |||
/* $end mountainfuns */ | |||
/* $end mountainmain */ | |||
void init_data(long *data, int n); | |||
int test(int elems, int stride); | |||
double run(int size, int stride, double Mhz); | |||
/* $begin mountainmain */ | |||
int main() | |||
{ | |||
int size; /* Working set size (in bytes) */ | |||
int stride; /* Stride (in array elements) */ | |||
double Mhz; /* Clock frequency */ | |||
FILE *fp = NULL; | |||
fp = fopen("mountain.txt", "w+"); | |||
init_data(data, MAXELEMS); /* Initialize each element in data */ | |||
Mhz = mhz(0); /* Estimate the clock frequency */ | |||
/* $end mountainmain */ | |||
/* Not shown in the text */ | |||
fprintf(fp, "Clock frequency is approx. %.1f MHz\n", Mhz); | |||
fprintf(fp, "Memory mountain (MB/sec)\n"); | |||
fprintf(fp, "\t"); | |||
for (stride = 1; stride <= MAXSTRIDE; stride++) | |||
fprintf(fp, "s%d\t", stride); | |||
fprintf(fp, "\n"); | |||
/* $begin mountainmain */ | |||
for (size = MAXBYTES; size >= MINBYTES; size >>= 1) { | |||
/* $end mountainmain */ | |||
/* Not shown in the text */ | |||
if (size > (1 << 20)) | |||
fprintf(fp, "%dm\t", size / (1 << 20)); | |||
else | |||
fprintf(fp, "%dk\t", size / 1024); | |||
/* $begin mountainmain */ | |||
for (stride = 1; stride <= MAXSTRIDE; stride++) { | |||
fprintf(fp, "%.0f\t", run(size, stride, Mhz)); | |||
} | |||
fprintf(fp, "\n"); | |||
} | |||
fclose(fp); | |||
exit(0); | |||
} | |||
/* $end mountainmain */ | |||
/* init_data - initializes the array */ | |||
void init_data(long *data, int n) | |||
{ | |||
int i; | |||
for (i = 0; i < n; i++) | |||
data[i] = i; | |||
} | |||
/* $begin mountainfuns */ | |||
/* test - Iterate over first "elems" elements of array "data" with | |||
* stride of "stride", using 4x4 loop unrolling. | |||
*/ | |||
int test(int elems, int stride) | |||
{ | |||
long i, sx2 = stride*2, sx3 = stride*3, sx4 = stride*4; | |||
long acc0 = 0, acc1 = 0, acc2 = 0, acc3 = 0; | |||
long length = elems; | |||
long limit = length - sx4; | |||
/* Combine 4 elements at a time */ | |||
for (i = 0; i < limit; i += sx4) { | |||
acc0 = acc0 + data[i]; | |||
acc1 = acc1 + data[i+stride]; | |||
acc2 = acc2 + data[i+sx2]; | |||
acc3 = acc3 + data[i+sx3]; | |||
} | |||
/* Finish any remaining elements */ | |||
for (; i < length; i += stride) { | |||
acc0 = acc0 + data[i]; | |||
} | |||
return ((acc0 + acc1) + (acc2 + acc3)); | |||
} | |||
/* run - Run test(elems, stride) and return read throughput (MB/s). | |||
* "size" is in bytes, "stride" is in array elements, and Mhz is | |||
* CPU clock frequency in Mhz. | |||
*/ | |||
double run(int size, int stride, double Mhz) | |||
{ | |||
double cycles; | |||
int elems = size / sizeof(double); | |||
test(elems, stride); /* Warm up the cache */ //line:mem:warmup | |||
cycles = fcyc2(test, elems, stride, 0); /* Call test(elems,stride) */ //line:mem:fcyc | |||
return (size / stride) / (cycles / Mhz); /* Convert cycles to MB/s */ //line:mem:bwcompute | |||
} | |||
/* $end mountainfuns */ | |||
@ -0,0 +1,45 @@ | |||
// | |||
// Created by GentleCold on 2022/11/7. | |||
// | |||
#ifndef CSAPPLEARNING_MOUNTAIN_H | |||
#define CSAPPLEARNING_MOUNTAIN_H | |||
#include <stdio.h> | |||
#define MAXELEMS 10000 | |||
long data[MAXELEMS]; | |||
int read(int elems, int stride) { | |||
long i, sx2 = stride * 2, sx3 = stride * 3, sx4 = stride * 4; | |||
long acc0 = 0, acc1 = 0, acc2 = 0, acc3 = 0; | |||
long length = elems; | |||
long limit = length - sx4; | |||
for (i = 0; i < limit; i += sx4) { | |||
acc0 += data[i]; | |||
acc1 += data[i + stride]; | |||
acc2 += data[i + sx2]; | |||
acc3 += data[i + sx3]; | |||
} | |||
for (; i < length; i += stride) { | |||
acc0 += data[i]; | |||
} | |||
return ((acc0 + acc1) + (acc2 + acc3)); | |||
} | |||
double run(int size, int stride, double Mhz) { | |||
double cycles; | |||
int elems = size / sizeof(double); | |||
read(elems, stride); | |||
cycles = fcyc2(); | |||
} | |||
int mountain() { | |||
} | |||
#endif //CSAPPLEARNING_MOUNTAIN_H |