BaryonForge.utils.Parallelize module

class BaryonForge.utils.Parallelize.SimpleParallel(Runner_list, njobs=-1)[source]

Bases: object

A class to execute a list of Runner objects in parallel using a joblib instance.

The SimpleParallel class allows for the parallel execution of tasks encapsulated by Runner objects. It utilizes joblib’s parallel processing capabilities to distribute the workload across multiple CPU cores. This class is particularly useful for running computationally intensive tasks concurrently, thereby reducing the overall processing time.

Parameters:
  • Runner_list (list) – A list of Runner objects, each of which must have a process() method that defines the task to be executed.

  • njobs (int, optional) – The number of jobs (processes) to run in parallel. If set to -1, the number of jobs is set to the number of CPUs available. Default is -1.

single_run(i, Runner)[source]

Executes the process() method of a single Runner and returns its index and output.

process()[source]

Executes all Runners in the Runner_list in parallel, returning their outputs in the original order.

Examples

>>> runners = [Runner1(), Runner2(), Runner3()]
>>> parallel_executor = SimpleParallel(runners, njobs=2)
>>> results = parallel_executor.process()

Notes

  • The Runner objects must have a process() method implemented. This method will be called during the parallel execution.

  • The number of parallel jobs will be adjusted to match the number of Runners if there are fewer Runners than available CPUs.

  • Pickling errors can occur if the cosmology object (which is stored as an attribute in many classes) contains SwigPyObjects that are not pickleable. This happens with P(k) calculations. The baryonification pipeline natively destroys these problem objects. If you are facing issues, perform an explicit check of this.

See also

joblib.Parallel

The underlying parallel execution library used for running the tasks.

joblib.delayed

A function to wrap the tasks to be executed in parallel.

single_run(i, Runner)[source]

Executes the process() method of a single Runner and returns its index and output.

This method is used to run a single Runner object. It returns the index of the Runner and its output, which allows for the ordered aggregation of results.

Parameters:
  • i (int) – The index of the Runner in the Runner_list.

  • Runner (object) – An instance of a Runner object, which must have a process() method.

Returns:

A tuple containing the index of the Runner and the output of its process() method.

Return type:

tuple

process()[source]

Executes all Runners in the Runner_list in parallel, returning their outputs in the original order.

This method uses joblib’s Parallel and delayed functions to run each Runner’s process() method in parallel. The outputs are collected and sorted based on the original order of the Runner_list.

Returns:

A list of outputs from the process() methods of each Runner, in the order they were provided in Runner_list.

Return type:

list

class BaryonForge.utils.Parallelize.SplitJoinParallel(Runner, njobs=-1, seed=42)[source]

Bases: object

A class to split a single Runner task into multiple parallel tasks and join the results.

The SplitJoinParallel class takes a single Runner object and splits its task into multiple smaller tasks to be run in parallel. It uses joblib for parallel execution, and the results are combined (joined) after all parallel tasks are completed. This approach is particularly useful for handling large datasets or computationally intensive tasks by distributing the workload across multiple CPU cores.

This class is currently not designed to work with the Baryonification operation and will raise an error if it is passed any Baryonification runner.

Parameters:
  • Runner (object) – A Runner object that defines the task to be executed. This object must have methods and attributes necessary for splitting the task, such as HaloLightConeCatalog, LightconeShell, cosmo, etc.

  • seed (int) – A seed value for initializing the random number generator to ensure reproducibility when shuffling the halo catalog (shufflying is done to load balance the jobs). Default is 42.

  • njobs (int, optional) – The number of jobs (processes) to run in parallel. If set to -1, the number of jobs is set to the number of CPUs available. Default is -1.

Runner

The original Runner object that is being split for parallel processing.

Type:

object

seed

A seed value used when shuffling the halo catalog. Default is 42.

Type:

int

njobs

The number of jobs (processes) that will be run in parallel. This is adjusted based on the number of available CPUs and the length of the Runner_list.

Type:

int

Runner_list

A list of Runner objects, each representing a subset of the original task to be run in parallel.

Type:

list

split_run(Runner)[source]

Splits the original Runner task into multiple smaller Runner tasks for parallel processing.

single_run(Runner)[source]

Executes the process() method of a single Runner and returns its output.

process()[source]

Executes all Runners in the Runner_list in parallel, returning the combined output.

Examples

>>> runner = SomeRunnerClass()
>>> split_join_executor = SplitJoinParallel(runner, njobs=4)
>>> result = split_join_executor.process()

Notes

  • The Runner object must have specific attributes like HaloLightConeCatalog, LightconeShell, cosmo, model, mass_def, epsilon_max, and use_ellipticity.

  • The number of parallel jobs will be adjusted to match the number of splits if there are fewer splits than available CPUs.

  • Pickling errors can occur if the cosmology object (which is stored as an attribute in many classes) contains SwigPyObjects that are not pickleable. This happens with P(k) calculations. The baryonification pipeline natively destroys these problem objects. If you are facing issues, perform an explicit check of this.

See also

joblib.Parallel

The underlying parallel execution library used for running the tasks.

joblib.delayed

A function to wrap the tasks to be executed in parallel.

split_run(Runner)[source]

Splits the original Runner task into multiple smaller Runner tasks for parallel processing.

This method divides the halo catalog from the Runner into multiple subsets. Each subset is used to create a new Runner, which will process only that subset. The splitting process includes shuffling the catalog to optimize load balancing across processes.

Parameters:

Runner (object) – The original Runner object to be split.

Returns:

Runner_list – A list of new Runner objects, each configured to process a subset of the original Runner’s task.

Return type:

list

single_run(Runner)[source]

Executes the process() method of a single Runner and returns its output.

This method is used to run a single Runner object.

Parameters:

Runner (object) – An instance of a Runner object, which must have a process() method.

Returns:

The output of the Runner’s process() method.

Return type:

output

process()[source]

Executes all Runners in the Runner_list in parallel, returning the combined output.

This method uses joblib’s Parallel and delayed functions to run each Runner’s process() method in parallel. The outputs are combined by summing them, which is appropriate if the contributions from each runner can be linearly summed. This is ideal for any profile painting tasks, such as PaintProfilesShell or PaintProfilesGrid.

Returns:

map_out – A combined map output generated by summing the outputs from each Runner’s process() method.

Return type:

ndarray