algorithm - How to efficiently perform dynamic programming with complex state dependencies in Python? - Stack Overflow

admin•2025-03-19 18:39:13•questions•阅读0

I am working on a Python project that involves implementing a dynamic programming (DP) algorithm, but t

I am working on a Python project that involves implementing a dynamic programming (DP) algorithm, but the state dependencies are not straightforward. Here's a simplified version of my problem:

I need to calculate the minimum cost to traverse a 2D grid where each cell has a cost, but the movement rules are unusual:

You can move down, right, or diagonally down-right. Moving diagonally has an extra penalty depending on the sum of the costs of the starting and ending cells. Additionally, the cost to move into a cell may depend on whether the previous move was horizontal, vertical, or diagonal. For example: If grid[i][j] is the cost of cell (i, j), then the cost to reach (i, j) from (i-1, j-1) (diagonal) would be:

dp[i][j] = dp[i-1][j-1] + grid[i][j] + penalty_function(grid[i-1][j-1], grid[i][j])

But from (i-1, j) (vertical), it would simply be:

dp[i][j] = dp[i-1][j] + grid[i][j]

I attempted the following approach:

def min_cost(grid):
    rows, cols = len(grid), len(grid[0])
    dp = [[float('inf')] * cols for _ in range(rows)]
    dp[0][0] = grid[0][0]  # Starting point

    for i in range(1, rows):
        for j in range(1, cols):
            vertical = dp[i-1][j] + grid[i][j]
            horizontal = dp[i][j-1] + grid[i][j]
            diagonal = dp[i-1][j-1] + grid[i][j] + penalty_function(grid[i-1][j-1], grid[i][j])
            dp[i][j] = min(vertical, horizontal, diagonal)

    return dp[-1][-1]

However, this becomes inefficient for larger grids because the penalty function itself can be computationally expensive, and the solution doesn't scale well when the grid size exceeds 1000x1000.

Is there a way to optimize this DP approach, possibly by memoizing or precomputing parts of the penalty function? Would switching to libraries like NumPy or using parallel processing help in this scenario? Are there Python-specific tricks (e.g., @functools.lru_cache, generators) that I could use to improve performance while keeping the code clean and readable?

I am working on a Python project that involves implementing a dynamic programming (DP) algorithm, but the state dependencies are not straightforward. Here's a simplified version of my problem:

I need to calculate the minimum cost to traverse a 2D grid where each cell has a cost, but the movement rules are unusual:

dp[i][j] = dp[i-1][j-1] + grid[i][j] + penalty_function(grid[i-1][j-1], grid[i][j])

But from (i-1, j) (vertical), it would simply be:

dp[i][j] = dp[i-1][j] + grid[i][j]

I attempted the following approach:

def min_cost(grid):
    rows, cols = len(grid), len(grid[0])
    dp = [[float('inf')] * cols for _ in range(rows)]
    dp[0][0] = grid[0][0]  # Starting point

    for i in range(1, rows):
        for j in range(1, cols):
            vertical = dp[i-1][j] + grid[i][j]
            horizontal = dp[i][j-1] + grid[i][j]
            diagonal = dp[i-1][j-1] + grid[i][j] + penalty_function(grid[i-1][j-1], grid[i][j])
            dp[i][j] = min(vertical, horizontal, diagonal)

    return dp[-1][-1]

However, this becomes inefficient for larger grids because the penalty function itself can be computationally expensive, and the solution doesn't scale well when the grid size exceeds 1000x1000.

Share Improve this question asked Nov 20, 2024 at 12:46 Plamen Nikolov 33 bronze badges

1 We need to see the code of the penalty_function to help. – MrSmith42 Commented Nov 20, 2024 at 13:12
Pure-Python code is very slow because it manipulates dynamically-allocated reference-counted objects and the default CPython implementation is a slow interpreter which optimize nearly nothing. Lists of lists are slow because you need two fetch of dynamic objects, type checking, dynamic function calls etc. Please don't du such computation in CPython (except maybe for small data or prototyping). – Jérôme Richard Commented Nov 20, 2024 at 18:00
Vectorizing this code with Numpy is complicated but you can use Numpy+Cython (with views) or Numpy+Numba. The resulting compiled/JITed code should be at least 10 times faster (probably much more in practice like 100 times). Not to mention the computation can be parallelized (not easy though, unless you write a native C/C++ module using OpenMP). All of this assume penalty_function can be JITed/compiled too (otherwise the function call overhead will be the main bottleneck anyway). – Jérôme Richard Commented Nov 20, 2024 at 18:03
@JérômeRichard Writing pypy code looks just like using a restricted subset of Python, and is plenty fast. – btilly Commented Nov 20, 2024 at 18:15
@btilly PyPy help a lot because it is a JIT as opposed to CPython. But it is not as good as Numba mainly because PyPy is less restricted. If you are not convainced, try to write a matrix multiplication code with PyPy and lists. I did it. PyPy does not really vectorize the code (ie. use SIMD instruction) mainly because this is expensive (and PyPy is a tracing JIT, not a method JIT) and also because of a lot of guards (use to check types and non usual things). Because of that the optimized PyPy code was an order of magnitude slower than optimized native code (including Numba as fast as C) – Jérôme Richard Commented Nov 20, 2024 at 19:20

| Show 9 more comments

2 Answers 2

Sorted by: Reset to default 0

DP can proceed in two ways.

The first is bottom up. That's harder but often more efficient on memory.

The second is top down. Just write a recursive function, then memoize it.

If you're struggling with bottom up, just try top down.

In your case, though, this will on a 1000x1000 grid require a million data values, which you access in strange patterns. Bottom up makes more sense. Instead of the previous suggestion of rows, I would suggest diagonals. At the worst point it is the same as rows. But usually it is smaller, improving cache usage.

You don't need to store the whole 2D array of dp values. Your algorithm only needs the current and previous rows. The computational complexity isn't changed by using only 2 rows, but the space complexity is so in practice, less memory requirement will probably give a performance boost.

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1742355864a4428417.html