python - Any way to create a subset (or partially clone) from a conda environment? - Stack Overflow

For example, I have a working environment with 300 packages. Is there a clean way to duplicate or clone

For example, I have a working environment with 300 packages. Is there a clean way to duplicate or clone from this environment with only 100 packages, without the need to resolve/download dependencies?

I could manually copy the folder and delete a bunch of stuff that is not needed, but a "subcloning" method would be ideal.

For example, I have a working environment with 300 packages. Is there a clean way to duplicate or clone from this environment with only 100 packages, without the need to resolve/download dependencies?

I could manually copy the folder and delete a bunch of stuff that is not needed, but a "subcloning" method would be ideal.

Share Improve this question asked Nov 20, 2024 at 22:26 prusswanprusswan 7,1014 gold badges44 silver badges65 bronze badges 6
  • 1 My best guess would be to export a YAML file for the environment: saturncloud.io/blog/…, delete packages you don't need, and create a new environment. Conda will hardlink files if possible to save on disk space. – darthbith Commented Nov 20, 2024 at 23:37
  • 1 Have you looked into docs.anaconda/working-with-conda/packages/shared-pkg-cache – Tzane Commented Nov 21, 2024 at 14:21
  • @Tzane that might work but is conda install/create still needed? Although saving space is one of the reasons, the main reason is to skip the dependency checking and pinging the channels. I found docs.conda.io/projects/conda-build/en/latest/concepts/… but not sure if it will need the official channels for checking dependencies. – prusswan Commented Nov 21, 2024 at 18:33
  • How are you deciding which 100 packages to keep? Each package was installed either because you asked for it, or because it was a requirement for a package you asked for. – James Commented Nov 22, 2024 at 14:20
  • @James I can specify it with a list, but the key idea is that I want to reuse files from the master environment without going through any dependency checking (or at least, version checking) – prusswan Commented Nov 22, 2024 at 14:41
 |  Show 1 more comment

1 Answer 1

Reset to default 1

You have an enormous environment that you want to create a clone of with a smaller number of packages. Each package you installed also comes with a chain of dependencies, so if you want a smaller environment, you need to remove those top-level packages from your environment spec that you had manually added.

First step is to get a list of all of the package you specifically added and their current specs. This script grabs the list of manually added and removed packages, and then pulls the specific version that are current installed in the conda environment. It prints a spec file that can be used with the conda create command:

import json
import os
import platform
import subprocess
from pathlib import Path

history_file = Path(os.environ['CONDA_PREFIX']) / 'conda-meta' / 'history'
manual_specs = {}
for line in history_file.read_text().splitlines():
    if line.startswith('# update specs'):
        specs = eval(line.split(':', 1)[-1])
        manual_specs .update(dict((spec.split('=')+[None])[:2] for spec in specs))
    if line.startswith('# remove specs'):
        specs = eval(line.split(':', 1)[-1])
        for spec in specs:
            pkg = spec.split('=', 1)[0]
            if pkg in manual_specs :
                manual_specs .pop(pkg)

shell = platform.system()=='Windows'
res = subprocess.run(['conda', 'list', '-e', '--json'], capture_output=True, shell=shell)
env_specs = json.loads(res.stdout)
required_specs = [s for s in env_specs if s['name'] in manual_specs]

# print a shortened env file:
print('# This file may be used to create an environment using:')
print('# conda create --name <env> --file <this file>')
for s in required_specs:
    print(f"{s['name']}={s['version']}={s['build_string']}")

For my environment it prints:

# This file may be used to create an environment using:
# conda create --name <env> --file <this file>
datasets=2.19.0=pyhd8ed1ab_0
dill=0.3.8=pyhd8ed1ab_0
geos=3.12.2=h5a68840_0
h5py=3.11.0=nompi_py310hde4a0ea_100
ipython=8.22.2=pyh7428d3b_0
jedi=0.18.2=pyhd8ed1ab_0
lxml=5.2.1=py310hdccf185_0
matplotlib=3.5.3=py310haa95532_0
more-itertools=10.2.0=pyhd8ed1ab_0
multidict=6.0.5=py310h8d17308_0
opencv=4.8.1=py310h2d39e71_5
pandas=2.2.2=py310hecd3228_0
polars=0.20.23=pypi_0
protobuf=4.24.4=py310h19be30a_0
psutil=5.9.8=py310h8d17308_0
pyarrow=15.0.0=py310hd0bb7c2_0_cpu
pyasn1=0.4.8=pyhd3eb1b0_0
pycurl=7.45.3=py310h3f729d1_0
pyopenssl=24.0.0=py310haa95532_0
python=3.10.14=h4de0772_0_cpython
pythonnet=3.0.1=py310haa95532_0
pytorch=2.2.0=cpu_py310hb0bdfb8_0
requests=2.31.0=pyhd8ed1ab_0
ruamel.yaml=0.17.21=py310h2bbff1b_0
scikit-image=0.22.0=py310hecd3228_2
scikit-learn=1.4.2=py310hfd2573f_0
shapely=2.0.5=py310ha804f92_0
stanza=1.9.2=pyhd8ed1ab_0
tqdm=4.66.2=pyhd8ed1ab_0
transformers=4.29.2=pyhd8ed1ab_0
wincertstore=0.2=py310haa95532_2

At this point you can copy/paste this into a file or modify the script to write to a file (here I call it env.spec) and remove any of packages you do not want in your environment.

Then you can create a clone of just the sub-set of packages by using the file:

conda create -n myenv -c conda-fe --file path/to/env.spec --use-index-cache --offline

This will run the solver once for the package you are installing, but will only use your currently caches index. Note: it is really important to include ALL of the channels you used in your original environment, i.e. -c conda-fe -c intel -c pytorch.

For faster solving, you can add the --solver libmamba flag if your default solver is not mamba.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742325051a4422578.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信