BaryonForge.utils.Parallelize module
- class BaryonForge.utils.Parallelize.SimpleParallel(Runner_list, njobs=-1)[source]
Bases:
objectA class to execute a list of Runner objects in parallel using a joblib instance.
The SimpleParallel class allows for the parallel execution of tasks encapsulated by Runner objects. It utilizes joblib’s parallel processing capabilities to distribute the workload across multiple CPU cores. This class is particularly useful for running computationally intensive tasks concurrently, thereby reducing the overall processing time.
- Parameters:
- single_run(i, Runner)[source]
Executes the process() method of a single Runner and returns its index and output.
- process()[source]
Executes all Runners in the Runner_list in parallel, returning their outputs in the original order.
Examples
>>> runners = [Runner1(), Runner2(), Runner3()] >>> parallel_executor = SimpleParallel(runners, njobs=2) >>> results = parallel_executor.process()
Notes
The Runner objects must have a process() method implemented. This method will be called during the parallel execution.
The number of parallel jobs will be adjusted to match the number of Runners if there are fewer Runners than available CPUs.
Pickling errors can occur if the cosmology object (which is stored as an attribute in many classes) contains SwigPyObjects that are not pickleable. This happens with P(k) calculations. The baryonification pipeline natively destroys these problem objects. If you are facing issues, perform an explicit check of this.
See also
joblib.ParallelThe underlying parallel execution library used for running the tasks.
joblib.delayedA function to wrap the tasks to be executed in parallel.
- single_run(i, Runner)[source]
Executes the process() method of a single Runner and returns its index and output.
This method is used to run a single Runner object. It returns the index of the Runner and its output, which allows for the ordered aggregation of results.
- process()[source]
Executes all Runners in the Runner_list in parallel, returning their outputs in the original order.
This method uses joblib’s Parallel and delayed functions to run each Runner’s process() method in parallel. The outputs are collected and sorted based on the original order of the Runner_list.
- Returns:
A list of outputs from the process() methods of each Runner, in the order they were provided in Runner_list.
- Return type:
- class BaryonForge.utils.Parallelize.SplitJoinParallel(Runner, njobs=-1, seed=42)[source]
Bases:
objectA class to split a single Runner task into multiple parallel tasks and join the results.
The SplitJoinParallel class takes a single Runner object and splits its task into multiple smaller tasks to be run in parallel. It uses joblib for parallel execution, and the results are combined (joined) after all parallel tasks are completed. This approach is particularly useful for handling large datasets or computationally intensive tasks by distributing the workload across multiple CPU cores.
This class is currently not designed to work with the Baryonification operation and will raise an error if it is passed any Baryonification runner.
- Parameters:
Runner (object) – A Runner object that defines the task to be executed. This object must have methods and attributes necessary for splitting the task, such as HaloLightConeCatalog, LightconeShell, cosmo, etc.
seed (int) – A seed value for initializing the random number generator to ensure reproducibility when shuffling the halo catalog (shufflying is done to load balance the jobs). Default is 42.
njobs (int, optional) – The number of jobs (processes) to run in parallel. If set to -1, the number of jobs is set to the number of CPUs available. Default is -1.
- njobs
The number of jobs (processes) that will be run in parallel. This is adjusted based on the number of available CPUs and the length of the Runner_list.
- Type:
- Runner_list
A list of Runner objects, each representing a subset of the original task to be run in parallel.
- Type:
- split_run(Runner)[source]
Splits the original Runner task into multiple smaller Runner tasks for parallel processing.
- single_run(Runner)[source]
Executes the process() method of a single Runner and returns its output.
- process()[source]
Executes all Runners in the Runner_list in parallel, returning the combined output.
Examples
>>> runner = SomeRunnerClass() >>> split_join_executor = SplitJoinParallel(runner, njobs=4) >>> result = split_join_executor.process()
Notes
The Runner object must have specific attributes like HaloLightConeCatalog, LightconeShell, cosmo, model, mass_def, epsilon_max, and use_ellipticity.
The number of parallel jobs will be adjusted to match the number of splits if there are fewer splits than available CPUs.
Pickling errors can occur if the cosmology object (which is stored as an attribute in many classes) contains SwigPyObjects that are not pickleable. This happens with P(k) calculations. The baryonification pipeline natively destroys these problem objects. If you are facing issues, perform an explicit check of this.
See also
joblib.ParallelThe underlying parallel execution library used for running the tasks.
joblib.delayedA function to wrap the tasks to be executed in parallel.
- split_run(Runner)[source]
Splits the original Runner task into multiple smaller Runner tasks for parallel processing.
This method divides the halo catalog from the Runner into multiple subsets. Each subset is used to create a new Runner, which will process only that subset. The splitting process includes shuffling the catalog to optimize load balancing across processes.
- single_run(Runner)[source]
Executes the process() method of a single Runner and returns its output.
This method is used to run a single Runner object.
- Parameters:
Runner (object) – An instance of a Runner object, which must have a process() method.
- Returns:
The output of the Runner’s process() method.
- Return type:
output
- process()[source]
Executes all Runners in the Runner_list in parallel, returning the combined output.
This method uses joblib’s Parallel and delayed functions to run each Runner’s process() method in parallel. The outputs are combined by summing them, which is appropriate if the contributions from each runner can be linearly summed. This is ideal for any profile painting tasks, such as PaintProfilesShell or PaintProfilesGrid.
- Returns:
map_out – A combined map output generated by summing the outputs from each Runner’s process() method.
- Return type:
ndarray