asyncio并发编程-中

ThreadPoolExecutor和asyncio完成阻塞IO请求

这个小节我们看下如何将线程池和asyncio结合起来。

在协程里面我们还是需要使用多线程的，那什么时候需要使用多线程呢？

我们知道协程里面是不能加入阻塞IO的，但是有时我们必须执行阻塞IO的操作的时候，我们就需要多线程编程了，即我们要在协程中集成阻塞IO的时候就需要多线程操作。

import asyncio
from concurrent.futures import ThreadPoolExecutor
import socket
from urllib.parse import urlparse


def get_url(url):
    #通过socket请求html
    url = urlparse(url)
    host = url.netloc
    path = url.path
    if path == "":
        path = "/"

    #建立socket连接
    client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

    client.connect((host, 80)) #阻塞不会消耗cpu

    #不停的询问连接是否建立好， 需要while循环不停的去检查状态
    #做计算任务或者再次发起其他的连接请求

    client.send("GET {} HTTP/1.1\r\nHost:{}\r\nConnection:close\r\n\r\n".format(path, host).encode("utf8"))

    data = b""
    while True:
        d = client.recv(1024)
        if d:
            data += d
        else:
            break

    data = data.decode("utf8")
    html_data = data.split("\r\n\r\n")[1]
    print(html_data)
    client.close()


if __name__ == "__main__":
    import time
    start_time = time.time()
    
    loop = asyncio.get_event_loop()
    
    # 获得线程池的 executor
    executor = ThreadPoolExecutor()
    
    # 同样我们可以控制线程池的并发数量
    # executor = ThreadPoolExecutor()
    
    # 并发20个请
    tasks = []
    for url in range(20):
        url = "http://shop.projectsedu.com/goods/{}/".format(url)
        
        # 将阻塞的代码放到线程池中运行 返回的是 task
        task = loop.run_in_executor(executor, get_url, url)
        tasks.append(task)
        
    loop.run_until_complete(asyncio.wait(tasks))
    print("last time:{}".format(time.time()-start_time))
# 输出
last time:2.110485076904297

上面的代码会生成一个线程池然后让阻塞的代码去线程池中执行。

看下源码：

def run_in_executor(self, executor, func, *args):
    self._check_closed()
    if self._debug:
        self._check_callback(func, 'run_in_executor')
    if executor is None:
        executor = self._default_executor
        # 即使我们没创建 executor 也会自己创建一个
        if executor is None:
            executor = concurrent.futures.ThreadPoolExecutor()
            self._default_executor = executor
            
    # 最后将阻塞代码放到线程池执行 然后返回一个 future 对象
    return futures.wrap_future(executor.submit(func, *args), loop=self)
  
def wrap_future(future, *, loop=None):
  """Wrap concurrent.futures.Future object."""
  if isfuture(future):
      return future
  assert isinstance(future, concurrent.futures.Future), \
      'concurrent.futures.Future is expected, got {!r}'.format(future)
  if loop is None:
      loop = events.get_event_loop()
  new_future = loop.create_future()
  _chain_future(future, new_future)
  return new_future

当我们需要在协程中调用阻塞IO的时候就可以按照这种方式放到线程池中

asyncio模拟http请求

在asyncio里面凡是异步的地方都会创建一个future

import asyncio
from urllib.parse import urlparse


async def get_url(url):

    url = urlparse(url)
    host = url.netloc
    path = url.path
    if path == "":
        path = "/"

    # 通过协程的方式 建立socket连接 返回两个对象
    reader, writer = await asyncio.open_connection(host, 80)
    writer.write("GET {} HTTP/1.1\r\nHost:{}\r\nConnection:close\r\n\r\n".format(path, host).encode("utf8"))
    all_lines = []
    
    
    async for raw_line in reader:
        data = raw_line.decode("utf8")
        all_lines.append(data)
    html = "\n".join(all_lines)
    return html


async def main():
    tasks = []
    for url in range(20):
        url = "http://shop.projectsedu.com/goods/{}/".format(url)
        # 添加 future 对象到列表中
        tasks.append(asyncio.ensure_future(get_url(url)))
		
    # 将完成的打印出来 as_completed 返回的是协程
    for task in asyncio.as_completed(tasks):
        result = await task
        print(result)


if __name__ == "__main__":
    import time

    start_time = time.time()

    loop = asyncio.get_event_loop()

    loop.run_until_complete(main())

    print('last time:{}'.format(time.time() - start_time))

if __name__ == "__main__":
    import time

    start_time = time.time()

    loop = asyncio.get_event_loop()
    tasks = []
    for url in range(20):
        url = "http://shop.projectsedu.com/goods/{}/".format(url)
        tasks.append(get_url(url))

    loop.run_until_complete(asyncio.wait(tasks))

    print('last time:{}'.format(time.time() - start_time))

整个过程和之前我们实现的完全一致

future和task

future是一个结果容器会将结果放到future中，结果容器运行完毕之后会运行callback，类似线程池中的future。task是future的一个子类。

我们看下一个特殊的函数

class Future:
    """This class is *almost* compatible with concurrent.futures.Future.

    Differences:

    - result() and exception() do not take a timeout argument and
      raise an exception when the future isn't done yet.

    - Callbacks registered with add_done_callback() are always called
      via the event loop's call_soon_threadsafe().

    - This class is not compatible with the wait() and as_completed()
      methods in the concurrent.futures package.

    (In Python 3.4 or later we may be able to unify the implementations.)
    """
    
    def set_result(self, result):
        """Mark the future done and set its result.
    
        If the future is already done when this method is called, raises
        InvalidStateError.
        """
        if self._state != _PENDING:
            raise InvalidStateError('{}: {!r}'.format(self._state, self))
        self._result = result
        self._state = _FINISHED
        # 运行完赋值之后 执行回调
        self._schedule_callbacks()
        
    def _schedule_callbacks(self):
    """Internal: Ask the event loop to call all callbacks.

    The callbacks are scheduled to be called as soon as possible. Also
    clears the callback list.
    """
    callbacks = self._callbacks[:]
    if not callbacks:
        return

    self._callbacks[:] = []
    # 因为是单线程模式 调用 call_soon 放到 loop 队列中
    # 然后由loop队列取数据执行 
		# 其他部分和线程池类似
    for callback in callbacks:
        self._loop.call_soon(callback, self)

为什么需要一个Task对象呢？

实际上task是协程和future之间的一个重要桥梁。

我们看下具体代码

我们知道在定义一个协程之后，在驱动协程之前，必须对这个协程调用一次next或send方法，让这个协程生效

我们从源码看出task对象在初始化的时候调用了_step函数，而这个函数做了两个必要的事情。

第一个就是启动协程：

协程是和线程不一样的，协程必须要经历一个启动的过程。线程则不必，因此线程是由操作系统来调用的。但是协程是程序员自己调度的，我们必须要解决协程启动的问题。所以为了解决这个问题，抽象除了一个task对象，在初始化的时候就会启动协程。

第二个就是将协程的返回值设置到result中：

当运行时抛出StopIteration的时候，就会运行set_result将协程的return值保存到result中。线程中是没有StopIteration异常的。

为了保持协程和线程接口一致问题，创造了task对象来解决协程和线程不一样的地方所需要解决的问题。

我们看下上篇的图片，其中将上面的代码图形化了。

ThreadPoolExecutor和asyncio完成阻塞IO请求

asyncio模拟http请求

future和task

Asyncio同步和通信