Python multiprocessing taking significantly more time than sequential processing using the Multiprocessing module? - Stack Overf

admin•2025-03-20 09:13:53•questions•阅读1

I am trying to compare the efficiency of multiprocessing module in Python by performing a CPU intensive

I am trying to compare the efficiency of multiprocessing module in Python by performing a CPU intensive task.

Sequential Task:

import multiprocessing
import time

v1 = [0] * 5000000
v2 = [0] * 5000000

def worker1(nums):
    global v1
    for i in range(nums):
        v1[i] = i*i
    
def worker2(nums):
    global v2
    for i in range(nums):
        v2[i] = i*i*i

start = time.time()
worker1(5000000)
worker2(5000000)
end = time.time()

print(end-start)

Time taken for sequential task - ~ 1 second

The same task using multiprocessing:

import multiprocessing
import time

def worker1(nums,v1):
    for i in range(nums):
        v1[i] = i*i
    
def worker2(nums,v2):
    for i in range(nums):
        v2[i] = i*i*i
  

v1 = multiprocessing.Array('i',5000000)
v2 = multiprocessing.Array('i',5000000)


p1 = multiprocessing.Process(target=worker1, args = (5000000,v1))
p2 = multiprocessing.Process(target=worker2, args = (5000000,v2))

start = time.time()
p1.start()
p2.start()
p1.join()
p2.join()
end = time.time()

print(end-start)

Time taken for sequential task - ~ 12 seconds

The difference between the two is very significant and even though I can understand that there are some overheads in multiprocessing, it should have been faster than the sequential one right?

Please let me know if I am doing something wrong or if there is a silly mistake that should be corrected.

I am trying to compare the efficiency of multiprocessing module in Python by performing a CPU intensive task.

Sequential Task:

import multiprocessing
import time

v1 = [0] * 5000000
v2 = [0] * 5000000

def worker1(nums):
    global v1
    for i in range(nums):
        v1[i] = i*i
    
def worker2(nums):
    global v2
    for i in range(nums):
        v2[i] = i*i*i

start = time.time()
worker1(5000000)
worker2(5000000)
end = time.time()

print(end-start)

Time taken for sequential task - ~ 1 second

The same task using multiprocessing:

import multiprocessing
import time

def worker1(nums,v1):
    for i in range(nums):
        v1[i] = i*i
    
def worker2(nums,v2):
    for i in range(nums):
        v2[i] = i*i*i
  

v1 = multiprocessing.Array('i',5000000)
v2 = multiprocessing.Array('i',5000000)


p1 = multiprocessing.Process(target=worker1, args = (5000000,v1))
p2 = multiprocessing.Process(target=worker2, args = (5000000,v2))

start = time.time()
p1.start()
p2.start()
p1.join()
p2.join()
end = time.time()

print(end-start)

Time taken for sequential task - ~ 12 seconds

The difference between the two is very significant and even though I can understand that there are some overheads in multiprocessing, it should have been faster than the sequential one right?

Please let me know if I am doing something wrong or if there is a silly mistake that should be corrected.

Share Improve this question edited Nov 19, 2024 at 16:09 asked Nov 19, 2024 at 16:01 Satyam Rai 32 bronze badges

3 Copying data between processes has overhead. Multiprocessing is not a "make everything faster" magic bullet -- it only helps when the cost of the overhead is outweighed by the time saved by being able to run code in parallel without the GIL. It makes no sense to try to apply it when you're doing fast operations on large amounts of data -- you pay a heavy cost to transfer all that data, when the time it would have just taken to handle the data inside your existing process is low. – Charles Duffy Commented Nov 19, 2024 at 16:15
...the situation when you want to use multiprocessing is when you're (1) doing a substantial amount of CPU-bound work (2) where that work isn't handled by a C library that self-parallelizes like numpy or scipy or tensorflow (3) where ideally both arguments and return values, but definitely return values, are small (in terms of data size) and fast to serialize/deserialize/transfer. Unless ALL of those conditions are met, multiprocessing is not an appropriate tool. – Charles Duffy Commented Nov 19, 2024 at 16:21
@AhmedAEK, reopened -- sounds like you're well-positioned to add a good answer. – Charles Duffy Commented Nov 19, 2024 at 18:26

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

Python multiprocessing.Array has lock=True by default, and any write you do will lock and unlock a mutex (and potentially flush the CPU caches), this alone accounts for 11 of the 12 seconds of the multiprocessing version, using multiprocessing.Array('i',5000000, lock=False) alone brings it down to 1 second.

Now 2 processes take equal time to 1 process to do the same work, the culprit here is that we are also comparing list to multiprocessing.Array. if we use multiprocessing.Array for the single threaded version too we get

0.8543 1 process, list
1.2004 1 process, Array
0.8488 2 process, Array

multiprocessin.Array is slower than list because list stores pointers to python integer objects, while Array has to unbox this object to obtain the underlying integer value and write it to the C array, remember python integers have infinite precision, matter of fact if you replace multiprocessing.Array with array.array, you will get an overflow exception ! the data that was written to the multiprocessing.Array is not even correct.

import multiprocessing
import time

def worker1(nums, v1):
    for i in range(nums):
        v1[i] = i * i


def worker2(nums, v2):
    for i in range(nums):
        v2[i] = i * i * i

def one_process_list():
    v1 = [0] * 5000000
    v2 = [0] * 5000000

    def worker1(nums):
        for i in range(nums):
            v1[i] = i * i

    def worker2(nums):
        for i in range(nums):
            v2[i] = i * i * i

    start = time.time()
    worker1(5000000)
    worker2(5000000)
    end = time.time()

    print(f"{end-start:.4} 1 process, list")

def one_process_array():
    v1 = multiprocessing.Array('i', 5000000, lock=False)
    v2 = multiprocessing.Array('i', 5000000, lock=False)
    
    start = time.time()
    worker1(5000000, v1)
    worker2(5000000, v2)
    end = time.time()
    print(f"{end - start:.4} 1 process, Array")

def two_process_array():
    v1 = multiprocessing.Array('i', 5000000, lock=False)
    v2 = multiprocessing.Array('i', 5000000, lock=False)

    p1 = multiprocessing.Process(target=worker1, args=(5000000, v1))
    p2 = multiprocessing.Process(target=worker2, args=(5000000, v2))

    start = time.time()
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    end = time.time()

    print(f"{end - start:.4} 2 process, Array")

if __name__ == "__main__":
    one_process_list()
    one_process_array()
    two_process_array()

One way around this boxing is wrapping the shared_memory in numpy array, see Sharing contiguous numpy arrays between processes in python, this way you can do operations directly in C without boxing.

from multiprocessing.sharedctypes import RawArray
import numpy as np

def worker_numpy(nums, v1_raw):
    v1 = np.frombuffer(v1_raw, dtype=np.int32)
    v1[:] = np.arange(nums) ** 2 # iterates a 40 MB array 3 times !

def two_process_numpy():
    my_dtype = np.int32

    def create_shared_array(size, dtype=np.int32):
        dtype = np.dtype(dtype)
        if dtype.isbuiltin and dtype.char in 'bBhHiIlLfd':
            typecode = dtype.char
        else:
            typecode, size = 'B', size * dtype.itemsize

        return RawArray(typecode, size)

    v1 = create_shared_array(5000000, dtype=my_dtype)
    v2 = create_shared_array(5000000, dtype=my_dtype)

    p1 = multiprocessing.Process(target=worker_numpy, args=(5000000, v1))
    p2 = multiprocessing.Process(target=worker_numpy, args=(5000000, v2))

    start = time.time()
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    end = time.time()

    print(f"{end - start:.4} 2 process, numpy")

0.8543 1 process, list
1.2004 1 process, Array
0.8488 2 process, Array
0.2774 2 process, numpy
0.0543 1 process, numpy

with numpy the entire time is actually wasted spawning the 2 extra processes, note that you may get different timing on linux where the cost of fork is less than the cost of spawn on windows, but the relative ordering won't change, also with numpy, since the GIL is dropped, we can use multithreading instead to parallelize our code.

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1742415916a4439739.html

admin

questions
documentation - How to document JavaScript configuration objects in Visual Studio Intellisense - Stack Overflow
I have been using Visual Studio's JavaScript Intellisense functionality for a while now and have m
admin
3小时前
10
questions
javascript - Technology behind real-time polling - Stack Overflow
I am looking at facebook news feedticker right now and I am wondering what technologyarchitecture it
admin
3小时前
10
questions
javascript - FullCalander 5 add event - Stack Overflow
I'm building a CRM and trying to make a button outside the calendar div and if I press the button,
admin
3小时前
30
questions
users - Parse error: syntax error, unexpected '}' in C:wamp64wwwProiectaplicatieuser_check.php on line 18
Closed. This question is off-topic. It is not currently accepting answers.Your question should be specific to WordPress.
admin
3小时前
20
questions
javascript - User Input in Google Spreadsheet Script - Stack Overflow
I have a script now that creates a big matrix based on all kinds of variables. A few variables change p
admin
3小时前
30
questions
javascript - Cannot read property 'clone' of undefined - Stack Overflow
I'm using Fullcalendar and I'm trying to update events. I'm trying to make an ajax callb
admin
3小时前
30
questions
jquery - Resize images to be fullscreen with javascript? - Stack Overflow
Can you resize images to be fullscreen with javascript in the same way you can with flash? Ive found th
admin
3小时前
30
questions
javascript - Expected an assignment or function call and instead saw an expression error - Stack Overflow
Having a slight issue with some code that I have written, I keep getting the error: expected an assignm
admin
3小时前
20
questions
ImportError: Couldn't import Django. Are you sure it's installed and available on your PYT - Stack Overflow
as i made to apps accounts and core . in accounts model i have created custom user and in core i made u
admin
3小时前
20
questions
javascript - Chrome extension as static server - Stack Overflow
I would like to make a chrome extension that simply serve static content from a defined directory.As
admin
3小时前
20
网站建设
windows7下RabbitMQ的安装
一、下载资源 Rabbit MQ 是建立在强大的Erlang OTP平台上，因此安装Rabbit MQ的前提是安装Erlang。（在官网自行选择版本） 1、otp_win
admin
55分钟前
20
网站建设
【音视频处理】FFmpeg for Windows 安装教程
FFmpeg 是一个强大的多媒体处理工具，可以处理音视频的各种任务，包括格式转换、裁剪、合并等操作，市面上你可以看到的几乎所有的音视频的处理工具内部都离不开FFmpeg的身影
admin
51分钟前
20
网站建设
Windows server 2003 下载
Windows server 2003 下载简介： Windows server2003是 Windows server NT4.0的后续版本，它继了 Windows server NT和
admin
43分钟前
10
网站建设
Windows11安装JDK
JDK 安装教程 Windows11 安装 JDK1.8 软件下载官网下载 ： 下载链接 https:www.oraclejavatechnologiesdownloads在官网直接下载需要注册一个O
admin
43分钟前
10
网站建设
在windows11本地部署大模型的记录（OLLAMA、AnythingLLM）
前言本文仅为个人实践记录，非专业领域，有参考前辈们的操作指南。如有谬误还请海涵。本次记录的目标包括： 1.安装并能够本地部署大模型（如llama3
admin
30分钟前
10
网站建设
windows7下面利用docker搭建jitsi-meet测试环境
之前写了一篇相关的文章，发现有缺陷无法通过局域网访问，又删除了，这两天又仔细研究了一下jitsi-meet官方的docker部署方式，发现更容易实现。
admin
23分钟前
10
网站建设
windows cmd 批处理脚本命令行压缩工具7z zip压缩文件自动压缩脚本
Windows命令行batcmd脚本的应用之自动备份异地备份2.1.2windows cmd 批处理脚本命令行压缩工具7z zip压缩文件自动压缩脚本课程地址：http:edu.51ctocourse15
admin
22分钟前
10
网站建设
jdk-8u181-windows-x64 地址
百度搜一大堆垃圾网站，要不就是过期的地址。搜东西还得看bing。不废话，直接放地址： 我是地址
admin
21分钟前
10
网站建设
zoc7 下载和使用指南连接远程主机
mac用SSH方式远程连接Linux主机，用终端每次都要输入用户名和密码登录，太麻烦了，现在有zoc7完美解决这个问题， I. zoc7 for mac
admin
19分钟前
10
网站建设
【YOLO部署Android安卓手机APP】YOLOv8部署到安卓实时目标检测识别——官方自训练模型YOLOv8人脸车辆等目标检测（可自定义更换其他目标）
前言：本文首先讲解如何直接使用官方训练好的模型部署到手机APP进行人脸检测，然后讲解如何修改其他目标进行检测，以车辆检测为例进行讲解如何训练自己的模型部署到手机APP。本文为详细设计配置文档，包含完整所需的环境配置搭建，项目工程配置步骤等
admin
12分钟前
10

发表回复

评论列表（0条）

暂无评论

Python multiprocessing taking significantly more time than sequential processing using the Multiprocessing module? - Stack Overf

1 Answer 1

发表回复

评论列表（0条）

联系我们

400-800-8888

Python multiprocessing taking significantly more time than sequential processing using the Multiprocessing module? - Stack Overf

1 Answer 1

相关推荐

发表回复

评论列表（0条）

联系我们

400-800-8888