《Python源码剖析》读书笔记-8 Python虚拟机框架


第8章 Python虚拟机框架

  • ####PyFrameObject

    • 定义为:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      typedef struct _frame {
      PyObject_VAR_HEAD
      struct _frame *f_back; /* previous frame, or NULL */
      PyCodeObject *f_code; /* code segment */
      PyObject *f_builtins; /* builtin symbol table (PyDictObject) */
      PyObject *f_globals; /* global symbol table (PyDictObject) */
      PyObject *f_locals; /* local symbol table (any mapping) */
      PyObject **f_valuestack; /* points after the last local */
      /* Next free slot in f_valuestack. Frame creation sets to f_valuestack.
      Frame evaluation usually NULLs it, but a frame that yields sets it
      to the current stack top. */

      PyObject **f_stacktop;
      PyObject *f_trace; /* Trace function */

      /* If an exception is raised in this frame, the next three are used to
      * record the exception info (if any) originally in the thread state. See
      * comments before set_exc_info() -- it's not obvious.
      * Invariant: if _type is NULL, then so are _value and _traceback.
      * Desired invariant: all three are NULL, or all three are non-NULL. That
      * one isn't currently true, but "should be".
      */

      PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;

      PyThreadState *f_tstate;
      int f_lasti; /* Last instruction if called */
      /* Call PyFrame_GetLineNumber() instead of reading this field
      directly. As of 2.3 f_lineno is only valid when tracing is
      active (i.e. when f_trace is set). At other times we use
      PyCode_Addr2Line to calculate the line from the current
      bytecode index. */

      int f_lineno; /* Current line number */
      int f_iblock; /* index in f_blockstack */
      PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
      PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */
      } PyFrameObject;
    • PyCodeObject和PyFrameObject。PyFrameObject是Python的执行环境,PyCodeObject只是其中的一部分。从上面PyFrameObject的定义也可以看出,PyFrameObject还包含了名字空间,栈地址,线程状态对象等诸多其他内容

    • f_builtins,f_globals,f_locals分别以PyDictObject的形式存储了builtins,globals和locals三个名字空间,形式为(名字–>对象)
    • PyObject_VAR_HEAD这个宏定义的出现,说明PyFrameObject是一个可变长度的对象,可变的部分是最后定义的f_localsplus,它存储了运行时栈的内容,以及PyCodeObject中的一些内容(例如cellvars,freevars等)。物理上两部分共用一块连续内存,但是它们在逻辑上是完全隔离的,互不干扰
  • ####PyFrame_New

    • 上面提到,f_localsplus还保存了PyCodeObject中的一些内容,这部分内容是什么?从PyFrame_New中的以下代码可以看出:

      1
      2
      3
      4
      5
      6
      7
      Py_ssize_t extras, ncells, nfrees;
      ncells = PyTuple_GET_SIZE(code->co_cellvars);
      nfrees = PyTuple_GET_SIZE(code->co_freevars);
      extras = code->co_stacksize + code->co_nlocals + ncells + nfrees;
      f->f_code = code;
      extras = code->co_nlocals + ncells + nfrees;
      f->f_valuestack = f->f_localsplus + extras;
    • 所以它存放了PyCodeObject的co_cellvars,co_freevars和co_nlocals。据说这些和闭包的实现有关,听起来很吊哦,以后再讲……

  • ####作用域

    • 一般的,我们使用LGB的顺序查找变量名,L-locals,G-globals,B-builtins,而当出现嵌套定义时,会采用LEGB的顺序查找变量名,E-enclosing。在生成闭包对象时,会生成一个包含了闭包对象外层命名空间的特殊命名空间,也就是这里的E
    • 关于作用域,有如下的一段Python代码,会抛出异常:

      1
      2
      3
      4
      5
      6
      7
      8
      a = 1

      def p():
      print a; #1
      a = 2; #2
      print a; #3

      p();
    • 抛出异常是因为,在函数p定义时,a在其内部命名空间是可见的(因为#2处赋值代码),因此当程序执行到#1时,在locals命名空间就查找到了a的存在,但是此时(#1处),a还没有被定义,所以这里使用了一个可见而未定义的变量,就会抛出异常了

  • ####虚拟机运行框架

    • 初始化运行环境,很复杂,
    • 核心部分是PyEval_EvalFrameEx的(递归)执行
    • 初始化PyCodeObject、栈顶指针等对象
    • 循环取出指令及参数(如果有的话)
    • 通过switch…case…语句判断指令类型并执行
    • 使用why变量保存退出原因
      1
      2
      3
      4
      5
      6
      7
      8
      9
      enum why_code {
      WHY_NOT = 0x0001, /* No error */
      WHY_EXCEPTION = 0x0002, /* Exception occurred */
      WHY_RERAISE = 0x0004, /* Exception re-raised by 'finally' */
      WHY_RETURN = 0x0008, /* 'return' statement */
      WHY_BREAK = 0x0010, /* 'break' statement */
      WHY_CONTINUE = 0x0020, /* 'continue' statement */
      WHY_YIELD = 0x0040 /* 'yield' operator */
      };
  • ####线程模型

    • 对应于操作系统中CPU的概念,在Python的线程模型中,Python虚拟机就是这个CPU的软件实现
    • 线程状态对象PyThreadState,保存了一个PyFrameObject的链表,说明每个线程都可以独立地执行一系列的栈帧

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      typedef struct _ts {
      /* See Python/ceval.c for comments explaining most fields */

      struct _ts *next;
      PyInterpreterState *interp;

      struct _frame *frame;
      int recursion_depth;
      /* 'tracing' keeps track of the execution depth when tracing/profiling.
      This is to prevent the actual trace/profile code from being recorded in
      the trace/profile. */

      int tracing;
      int use_tracing;

      Py_tracefunc c_profilefunc;
      Py_tracefunc c_tracefunc;
      PyObject *c_profileobj;
      PyObject *c_traceobj;

      PyObject *curexc_type;
      PyObject *curexc_value;
      PyObject *curexc_traceback;

      PyObject *exc_type;
      PyObject *exc_value;
      PyObject *exc_traceback;

      PyObject *dict; /* Stores per-thread state */

      /* tick_counter is incremented whenever the check_interval ticker
      * reaches zero. The purpose is to give a useful measure of the number
      * of interpreted bytecode instructions in a given thread. This
      * extremely lightweight statistic collector may be of interest to
      * profilers (like psyco.jit()), although nothing in the core uses it.
      */

      int tick_counter;

      int gilstate_counter;

      PyObject *async_exc; /* Asynchronous exception to raise */
      long thread_id; /* Thread id where this tstate was created */

      int trash_delete_nesting;
      PyObject *trash_delete_later;

      /* XXX signal handlers should also be here */

      } PyThreadState;
    • 进程状态对象PyInterpreterState,保存了一个PyThreadState的链表,还保存了像builtins,modules这样的全局变量,供线程共享。这里和操作系统的进程概念吻合

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      typedef struct _is {

      struct _is *next;
      struct _ts *tstate_head;

      PyObject *modules;
      PyObject *sysdict;
      PyObject *builtins;
      PyObject *modules_reloading;

      PyObject *codec_search_path;
      PyObject *codec_search_cache;
      PyObject *codec_error_registry;

      #ifdef HAVE_DLOPEN
      int dlopenflags;
      #endif
      #ifdef WITH_TSC
      int tscdump;
      #endif

      } PyInterpreterState;
    • Python线程对应了操作系统的线程

    • 使用全局解释器锁(GIL)线程同步

欢迎关注我的微信公众号,技术·生活·思考:
后端技术小黑屋

评论