Software: Difference between revisions

From imk-tro wiki
Jump to navigation Jump to search
(Created page with " == Essential Software == The following is a list of essential software packages and the links to their web pages: Many of these can be installed using anaconda [https://www....")
 
No edit summary
 
(3 intermediate revisions by one other user not shown)
Line 29: Line 29:
'''Data analysis''':
'''Data analysis''':
*NumPy [https://numpy.org/]
*NumPy [https://numpy.org/]
*xarray [https://docs.xarray.dev/en/stable/#]
*xarray [https://docs.xarray.dev/en/stable/#], easy handling of large nc files.
*pandas [https://pandas.pydata.org/]
*pandas [https://pandas.pydata.org/]
*SciPy [https://scipy.org/]
*SciPy [https://scipy.org/]
Line 38: Line 38:
*PyTorch [https://pytorch.org/]
*PyTorch [https://pytorch.org/]
*TensorFlow [https://www.tensorflow.org/]
*TensorFlow [https://www.tensorflow.org/]
*multiprocessing [https://docs.python.org/3/library/multiprocessing.html]
'''Data visualization''':
'''Data visualization''':
*seaborn [https://seaborn.pydata.org/]
*seaborn [https://seaborn.pydata.org/]
Line 58: Line 59:
* Paraview for 3D rendering [https://www.paraview.org/], note this one requires more computational power and should ideally be run through a supercomputer via VNC client.
* Paraview for 3D rendering [https://www.paraview.org/], note this one requires more computational power and should ideally be run through a supercomputer via VNC client.
* Cartopy (python package) for maps [https://scitools.org.uk/cartopy/docs/latest/]
* Cartopy (python package) for maps [https://scitools.org.uk/cartopy/docs/latest/]
* Matplotlib (standard python package) [https://matplotlib.org/]


== Colourblind Friendly Plotting Tools==

Here's a list of a few websites that have information and also palette generation for colourblind friendly plots:

* https://colorbrewer2.org/#type=sequential&scheme=BuPu&n=9
* https://hclwizard.org/
* http://hclwizard.org:3000/hclwizard/

== HPC (High-performance computing) commands ==
Usefull commands based on slurm [https://slurm.schedmd.com/], which is mainly used on large clusters are:
* sbatch [https://slurm.schedmd.com/sbatch.html]
:- Useful is the --array option to submit a job array, using the index as an argument. This can be used, for example, to apply an analysis to many files or a large dataset. (e.g. sbatch --array=0-99 script.sh)
* squeue -l
* salloc
* sinfo

Latest revision as of 09:31, 23 June 2023

Essential Software

The following is a list of essential software packages and the links to their web pages:

Many of these can be installed using anaconda [1].

CDO

https://code.mpimet.mpg.de/projects/cdo/.

Note that CDO has a lot of built-in functions that are not well documented, but details about these can be usually found in their discussion forums, https://code.mpimet.mpg.de/projects/cdo/boards.

Python

Instructions on how to download and install Python for all OSs can be found at: https://www.python.org/.

It is usually recommended to use the Anaconda distribution to install Python. Details on how to do this are here: https://www.anaconda.com/

Python can be combined with a good Integrated Development Environment (IDE) of your choice. All existing IDEs have their pros and cons. Some of the most popular IDEs are the following:

Python boasts a large number of packages. Some of the most used packages for manipulating large files in NetCDF, HDF5 or CSV formats are the following:

Data analysis:

  • NumPy [9]
  • xarray [10], easy handling of large nc files.
  • pandas [11]
  • SciPy [12]

Parallel computing, Machine learning, Deep learning etc.:

Data visualization:

Most of these packages are distributed through either Conda [24] or Pip [25].

On top of these, various users also create packages tuned for specific purposes. They are usually made public through GitHub [26].

Various forums exist for the sole purpose of clearing specific questions about coding. One such forum is stackoverflow[27]. Medium [28] is also a good source for reading up about new ideas and tools in Python and other languages.

Visualisation

The visualisation of "nc" files can be made easier using:

  • NcView [29]
  • Panoply [30]
  • Paraview for 3D rendering [31], note this one requires more computational power and should ideally be run through a supercomputer via VNC client.
  • Cartopy (python package) for maps [32]
  • Matplotlib (standard python package) [33]


Colourblind Friendly Plotting Tools

Here's a list of a few websites that have information and also palette generation for colourblind friendly plots:

HPC (High-performance computing) commands

Usefull commands based on slurm [34], which is mainly used on large clusters are:

- Useful is the --array option to submit a job array, using the index as an argument. This can be used, for example, to apply an analysis to many files or a large dataset. (e.g. sbatch --array=0-99 script.sh)
  • squeue -l
  • salloc
  • sinfo