《操作系统》的实验代码。
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

351 lines
14 KiB

  1. Welcome to this simulator. The idea is to gain familiarity with threads by
  2. seeing how they interleave; the simulator, x86.py, will help you in
  3. gaining this understanding.
  4. The simulator mimicks the execution of short assembly sequences by multiple
  5. threads. Note that the OS code that would run (for example, to perform a
  6. context switch) is *not* shown; thus, all you see is the interleaving of the
  7. user code.
  8. The assembly code that is run is based on x86, but somewhat simplified.
  9. In this instruction set, there are four general-purpose registers
  10. (%ax, %bx, %cx, %dx), a program counter (PC), and a small set of instructions
  11. which will be enough for our purposes. We've also added a few extra GP
  12. registers (%ex, %fx) which don't quite match anything in x86 land
  13. (but that is OK).
  14. Here is an example code snippet that we will be able to run:
  15. .main
  16. mov 2000, %ax # get the value at the address
  17. add $1, %ax # increment it
  18. mov %ax, 2000 # store it back
  19. halt
  20. The code is easy to understand. The first instruction, an x86 "mov", simply
  21. loads a value from the address specified by 2000 into the register %ax.
  22. Addresses, in this subset of x86, can take some of the following forms:
  23. 2000 -> the number (2000) is the address
  24. (%cx) -> contents of register (in parentheses) forms the address
  25. 1000(%dx) -> the number + contents of the register form the address
  26. 10(%ax,%bx) -> the number + reg1 + reg2 forms the address
  27. 10(%ax,%bx,4) -> the number + reg1 + (reg2*scaling) forms the address
  28. To store a value, the same "mov" instruction is used, but this time with the
  29. arguments reversed, e.g.:
  30. mov %ax, 2000
  31. The "add" instruction, from the sequence above, should be clear: it adds an
  32. immediate value (specified by $1) to the register specified in the second
  33. argument (i.e., %ax = %ax + 1).
  34. Thus, we now can understand the code sequence above: it loads the value at
  35. address 2000, adds 1 to it, and then stores the value back into address 2000.
  36. The fake-ish "halt" instruction just stops running this thread.
  37. Let's run the simulator and see how this all works! Assume the above code
  38. sequence is in the file "simple-race.s".
  39. prompt> ./x86.py -p simple-race.s -t 1
  40. Thread 0
  41. 1000 mov 2000, %ax
  42. 1001 add $1, %ax
  43. 1002 mov %ax, 2000
  44. 1003 halt
  45. prompt>
  46. The arguments used here specify the program (-p), the number of threads (-t
  47. 1), and the interrupt interval, which is how often a scheduler will be woken
  48. and run to switch to a different task. Because there is only one thread in
  49. this example, this interval does not matter.
  50. The output is easy to read: the simulator prints the program counter (here
  51. shown from 1000 to 1003) and the instruction that gets executed. Note that we
  52. assume (unrealistically) that all instructions just take up a single byte in
  53. memory; in x86, instructions are variable-sized and would take up from one to
  54. a small number of bytes.
  55. We can use more detailed tracing to get a better sense of how machine state
  56. changes during the execution:
  57. prompt> ./x86.py -p simple-race.s -t 1 -M 2000 -R ax,bx
  58. 2000 ax bx Thread 0
  59. ? ? ?
  60. ? ? ? 1000 mov 2000, %ax
  61. ? ? ? 1001 add $1, %ax
  62. ? ? ? 1002 mov %ax, 2000
  63. ? ? ? 1003 halt
  64. Oops! Forgot the -c flag (which actually computes the answers for you).
  65. prompt> ./x86.py -p simple-race.s -t 1 -M 2000 -R ax,bx -c
  66. 2000 ax bx Thread 0
  67. 0 0 0
  68. 0 0 0 1000 mov 2000, %ax
  69. 0 1 0 1001 add $1, %ax
  70. 1 1 0 1002 mov %ax, 2000
  71. 1 1 0 1003 halt
  72. By using the -M flag, we can trace memory locations (a comma-separated list
  73. lets you trace more than one, e.g., 2000,3000); by using the -R flag we can
  74. track the values inside specific registers.
  75. The values on the left show the memory/register contents AFTER the instruction
  76. on the right has executed. For example, after the "add" instruction, you can
  77. see that %ax has been incremented to the value 1; after the second "mov"
  78. instruction (at PC=1002), you can see that the memory contents at 2000 are
  79. now also incremented.
  80. There are a few more instructions you'll need to know, so let's get to them
  81. now. Here is a code snippet of a loop:
  82. .main
  83. .top
  84. sub $1,%dx
  85. test $0,%dx
  86. jgte .top
  87. halt
  88. A few things have been introduced here. First is the "test" instruction.
  89. This instruction takes two arguments and compares them; it then sets implicit
  90. "condition codes" (kind of like 1-bit registers) which subsequent instructions
  91. can act upon.
  92. In this case, the other new instruction is the "jump" instruction (in this
  93. case, "jgte" which stands for "jump if greater than or equal to"). This
  94. instruction jumps if the first value is greater than or equal to the second
  95. in the test.
  96. One last point: to really make this code work, dx must be initialized to 1 or
  97. greater.
  98. Thus, we run the program like this:
  99. prompt> ./x86.py -p loop.s -t 1 -a dx=3 -R dx -C -c
  100. dx >= > <= < != == Thread 0
  101. 3 0 0 0 0 0 0
  102. 2 0 0 0 0 0 0 1000 sub $1,%dx
  103. 2 1 1 0 0 1 0 1001 test $0,%dx
  104. 2 1 1 0 0 1 0 1002 jgte .top
  105. 1 1 1 0 0 1 0 1000 sub $1,%dx
  106. 1 1 1 0 0 1 0 1001 test $0,%dx
  107. 1 1 1 0 0 1 0 1002 jgte .top
  108. 0 1 1 0 0 1 0 1000 sub $1,%dx
  109. 0 1 0 1 0 0 1 1001 test $0,%dx
  110. 0 1 0 1 0 0 1 1002 jgte .top
  111. 0 1 0 1 0 0 1 1003 halt
  112. The "-R dx" flag traces the value of %dx; the "-C" flag traces the values of
  113. the condition codes that get set by a test instruction. Finally, the "-a dx=3"
  114. flag sets the %dx register to the value 3 to start with.
  115. As you can see from the trace, the "sub" instruction slowly lowers the value
  116. of %dx. The first few times "test" is called, only the ">=", ">", and "!="
  117. conditions get set. However, the last "test" in the trace finds %dx and 0 to
  118. be equal, and thus the subsequent jump does NOT take place, and the program
  119. finally halts.
  120. Now, finally, we get to a more interesting case, i.e., a race condition with
  121. multiple threads. Let's look at the code first:
  122. .main
  123. .top
  124. # critical section
  125. mov 2000, %ax # get the value at the address
  126. add $1, %ax # increment it
  127. mov %ax, 2000 # store it back
  128. # see if we're still looping
  129. sub $1, %bx
  130. test $0, %bx
  131. jgt .top
  132. halt
  133. The code has a critical section which loads the value of a variable
  134. (at address 2000), then adds 1 to the value, then stores it back.
  135. The code after just decrements a loop counter (in %bx), tests if it
  136. is greater than or equal to zero, and if so, jumps back to the top
  137. to the critical section again.
  138. prompt> ./x86.py -p looping-race-nolock.s -t 2 -a bx=1 -M 2000 -c
  139. 2000 bx Thread 0 Thread 1
  140. 0 1
  141. 0 1 1000 mov 2000, %ax
  142. 0 1 1001 add $1, %ax
  143. 1 1 1002 mov %ax, 2000
  144. 1 0 1003 sub $1, %bx
  145. 1 0 1004 test $0, %bx
  146. 1 0 1005 jgt .top
  147. 1 0 1006 halt
  148. 1 1 ----- Halt;Switch ----- ----- Halt;Switch -----
  149. 1 1 1000 mov 2000, %ax
  150. 1 1 1001 add $1, %ax
  151. 2 1 1002 mov %ax, 2000
  152. 2 0 1003 sub $1, %bx
  153. 2 0 1004 test $0, %bx
  154. 2 0 1005 jgt .top
  155. 2 0 1006 halt
  156. Here you can see each thread ran once, and each updated the shared
  157. variable at address 2000 once, thus resulting in a count of two there.
  158. The "Halt;Switch" line is inserted whenever a thread halts and another
  159. thread must be run.
  160. One last example: run the same thing above, but with a smaller interrupt
  161. frequency. Here is what that will look like:
  162. [mac Race-Analyze] ./x86.py -p looping-race-nolock.s -t 2 -a bx=1 -M 2000 -i 2
  163. 2000 Thread 0 Thread 1
  164. ?
  165. ? 1000 mov 2000, %ax
  166. ? 1001 add $1, %ax
  167. ? ------ Interrupt ------ ------ Interrupt ------
  168. ? 1000 mov 2000, %ax
  169. ? 1001 add $1, %ax
  170. ? ------ Interrupt ------ ------ Interrupt ------
  171. ? 1002 mov %ax, 2000
  172. ? 1003 sub $1, %bx
  173. ? ------ Interrupt ------ ------ Interrupt ------
  174. ? 1002 mov %ax, 2000
  175. ? 1003 sub $1, %bx
  176. ? ------ Interrupt ------ ------ Interrupt ------
  177. ? 1004 test $0, %bx
  178. ? 1005 jgt .top
  179. ? ------ Interrupt ------ ------ Interrupt ------
  180. ? 1004 test $0, %bx
  181. ? 1005 jgt .top
  182. ? ------ Interrupt ------ ------ Interrupt ------
  183. ? 1006 halt
  184. ? ----- Halt;Switch ----- ----- Halt;Switch -----
  185. ? 1006 halt
  186. As you can see, each thread is interrupt every 2 instructions, as we specify
  187. via the "-i 2" flag. What is the value of memory[2000] throughout this run?
  188. What should it have been?
  189. Now let's give a little more information on what can be simulated
  190. with this program. The full set of registers: %ax, %bx, %cx, %dx, and the PC.
  191. In this version, there is no support for a "stack", nor are there call
  192. and return instructions.
  193. The full set of instructions simulated are:
  194. mov immediate, register # moves immediate value to register
  195. mov memory, register # loads from memory into register
  196. mov register, register # moves value from one register to other
  197. mov register, memory # stores register contents in memory
  198. mov immediate, memory # stores immediate value in memory
  199. add immediate, register # register = register + immediate
  200. add register1, register2 # register2 = register2 + register1
  201. sub immediate, register # register = register - immediate
  202. sub register1, register2 # register2 = register2 - register1
  203. neg register # negates contents of register
  204. test immediate, register # compare immediate and register (set condition codes)
  205. test register, immediate # same but register and immediate
  206. test register, register # same but register and register
  207. jne # jump if test'd values are not equal
  208. je # ... equal
  209. jlt # ... second is less than first
  210. jlte # ... less than or equal
  211. jgt # ... is greater than
  212. jgte # ... greater than or equal
  213. push memory or register # push value in memory or from reg onto stack
  214. # stack is defined by sp register
  215. pop [register] # pop value off stack (into optional register)
  216. call label # call function at label
  217. xchg register, memory # atomic exchange:
  218. # put value of register into memory
  219. # return old contents of memory into reg
  220. # do both things atomically
  221. yield # switch to the next thread in the runqueue
  222. nop # no op
  223. Notes:
  224. - 'immediate' is something of the form $number
  225. - 'memory' is of the form 'number' or '(reg)' or 'number(reg)' or
  226. 'number(reg,reg)' or 'number(reg,reg,scale)' (as described above)
  227. - 'register' is one of %ax, %bx, %cx, %dx
  228. Finally, here are the full set of options to the simulator are available with
  229. the -h flag:
  230. Usage: x86.py [options]
  231. Options:
  232. -s SEED, --seed=SEED the random seed
  233. -t NUMTHREADS, --threads=NUMTHREADS
  234. number of threads
  235. -p PROGFILE, --program=PROGFILE
  236. source program (in .s)
  237. -i INTFREQ, --interrupt=INTFREQ
  238. interrupt frequency
  239. -P PROCSCHED, --procsched=PROCSCHED
  240. control exactly which thread runs when
  241. -r, --randints if interrupts are random
  242. -a ARGV, --argv=ARGV comma-separated per-thread args (e.g., ax=1,ax=2 sets
  243. thread 0 ax reg to 1 and thread 1 ax reg to 2);
  244. specify multiple regs per thread via colon-separated
  245. list (e.g., ax=1:bx=2,cx=3 sets thread 0 ax and bx and
  246. just cx for thread 1)
  247. -L LOADADDR, --loadaddr=LOADADDR
  248. address where to load code
  249. -m MEMSIZE, --memsize=MEMSIZE
  250. size of address space (KB)
  251. -M MEMTRACE, --memtrace=MEMTRACE
  252. comma-separated list of addrs to trace (e.g.,
  253. 20000,20001)
  254. -R REGTRACE, --regtrace=REGTRACE
  255. comma-separated list of regs to trace (e.g.,
  256. ax,bx,cx,dx)
  257. -C, --cctrace should we trace condition codes
  258. -S, --printstats print some extra stats
  259. -v, --verbose print some extra info
  260. -H HEADERCOUNT, --headercount=HEADERCOUNT
  261. how often to print a row header
  262. -c, --compute compute answers for me
  263. Most are obvious. Usage of -r turns on a random interrupter (from 1 to intfreq
  264. as specified by -i), which can make for more fun during homework problems.
  265. -P lets you specify exactly which threads run when;
  266. e.g., 11000 would run thread 1 for 2 instructions, then thread 0 for 3,
  267. then repeat
  268. -L specifies where in the address space to load the code.
  269. -m specified the size of the address space (in KB).
  270. -S prints some extra stats
  271. -c lets you see the values of the traced registers or memory values
  272. (otherwise they show up as question marks)
  273. -H lets you specify how often to print a row header (useful for long traces)
  274. Now you have the basics in place; read the questions at the end of the chapter
  275. to study this race condition and related issues in more depth.