You are here: Foswiki>CompMolBio Web>Projects>SoftwareFramework (15 Jun 2012, AntoniaMey)Edit

- ° financial (stock market),
- ° meteorological (sensor and satellite measurements) or
- ° physical experiments (mass spectrometry).

- ° Which algorithms should be contained in the framework?
- ° Which standarts should be used?
- ° How are users supposed to access the framework?
- ° How is the handling of large quantities of datas supposed to be managed via network? Can clusters be used?
- ° Which programming languages should be used?
- ° What kind of visualization should be used?

`GROMACS`

`MolTools`

`Java`

tools by F. Noe. Featuring a trajectory implementation this toolbox allows (amongst other things) the testing of trajectories on markovian properties, creation of transition matrix (+ finding of einvalues/-vectors) and the mapping of micro states to metastable states.
- Diagram_Trajectory:
`Trajectory`

implementation of`MolTools`

`Java`

library contains algorithms which aid in PCCA. The problem is:
- How easy can algorithms from MolTools be implemented/used together with other routines?

`OpenMS`

`C++`

framework for mass spectrometry. It contains tools to analyse spectrometry data (e.g. peptide & protein identification and clustering) and libraries for LC-MS data management. `OpenMS`

can find mass spectrometric peaks in raw LC-MS data (in mzData format). Peptides can be recognized by a special isotopic pattern.
Furthermore it provides a framework for the development of mass spectrometry related software.
`C++`

tools main purpose is to assist in Liquid chromatography-mass spectrometry (LC-MS).
This program is very specialized, fit for the single task of aiding in LC-MS. It is probably of no big use in the Phase Profiler Project.
`Metamacs`

`Java`

library for simulation and analysis of metastable Markov chains. It contains (amongst others) multiple algorithms for molecular structure alignment, time series discretization, HMMs, etc.
- Diagram_TimeSeries:
`TimeSeries`

implementation of`Metamacs`

`COLT`

package by Cern.
`Metamacs`

contains the following algorithms:
- ° Computes an optimal alignment for molecular structures in terms of the mean square distance while the position of one atom/one axis is fixed
- ° Discretizes a time series, Inner Simplex Algorithm (ISA) from Marcus Weber
- ° Graph Theory: Dijkstra (shortest path), representing flow, transition pathways between two metastable sets in a rough energy landscape
- ° Langevin dynamics, Lennard-Jones cluster, Mueller Potential, Ryckaert-Bellemans united atoms
- ° HMM: Compute likelihood of an observation series by means of backward variables OR forward variables, BaumWelch: Estimates model parameters, which maximizes the likelihood of the given observations, Generates a realization of a Hidden Markov Model with the output distribution of specified parameters, Viterbi (Compute the most likely state path q * for a given observed time series), Deterministic and stochastic integrators for Hamiltonian systems
- ° Linear algebra subroutines like an eigenvalue solver
- ° Markov chain Monte Carlo sampling methods
- ° Variants of the string method for finding transition paths in (rough) energy landscapes
- ° Wrapper for Gromacs Pipe Interface

`COLT`

library: - ° Fundamental general-purpose data structures optimized for numerical data, e.g.
- ° Dense and sparse matrices (multi-dimensional arrays), Linear Algebra, resizable arrays, associative containers, buffer management

`Java`

library contains many useful algorithms for discretization of time series, molecular structure alignment, analization of HMMs.
`Aida/FreeHEP`

`AIDA`

Project aims at developing abstract interfaces for common physics analysis objects, such as histograms and clouds. Tools which implement `AIDA`

interfaces can exchange objects in an `XML`

format.
There are `AIDA`

implementations in `Java`

(JAIDA), `C++`

and `Python`

. JAIDA is a subproject of `FreeHEP`

, another open source high-energy physics `Java`

library. Files written with JAIDA adhere to the AIDA IO standards and can be read by any AIDA compliant analysis system.
Interesting libraries:
- ° FreeHep Physics (collection of High Energy Physics related classes, including 3- and 4- vectors, simple matrices, particles and events, particle properties and jet finding)
- °
`JAIDA`

: clouds, data points (1D, 2D, 3D), histogramms, ...

- ° Linear Algebra
- ° Time Series Analysis
- ° Operation on Matrixes

- °
**Templated Multi-dimensional matrices**: Dense and sparse fixed sized (non-resizable) 1,2, 3 and d-dimensional matrices holding objects or primitive data types such as int, double, etc; Also known as multi-dimensional arrays or Data Cubes. - °
**Linear Algebra**: Standard matrix operations and decompositions. LU, QR, Cholesky, Eigenvalue, Singular value. - °
**Statistics**: Tools for basic and advanced statistics: Estimators, Gamma functions, Beta functions, Probabilities, Special integrals, etc.

- °
**Representation of sparse matrices in MATLABS' eigs-function**(using ARPACK): - MATLAB is using the
**Harwell-Boeing**format. This method uses three arrays internally to store sparse matrices with real elements. Consider an m-by-n sparse matrix with nnz nonzero entries stored in arrays of length nzmax:

- The first array contains all the nonzero elements of the array in floating-point format. The length of this array is equal to nzmax.

- The second array contains the corresponding integer row indices for the nonzero elements stored in the first nnz entries. This array also has length equal to nzmax.

- The third array contains n integer pointers to the start of each column in the other arrays and an additional pointer that marks the end of those arrays. The length of the third array is n+1.

- Revised simplex method.
- Primal-dual interior point method.
- Branch-and-bound method.
- Translator for GNU MathProg.
- Application program interface (API).
- Stand-alone LP/MIP solver.

- Clustering
- Failover (including sessions)
- Load balancing
- Distributed caching (using JBoss Cache, a standalone product)
- Distributed deployment (farming)
- Enterprise JavaBeans version 3

- X-Y charts
- Pie charts
- Gantt charts
- Bar charts

`Amira`

`C++`

and uses `OpenGL`

. The - ° Visualization of static molecules as well as time dependent data (trajectories).
- ° Computation and visualization of configuration densities from trajectories
- ° Flexible and fast ball and stick visualization, flexible color schemes
- ° "BondAngle-style" visualization
- ° Flexible and convenient tools to select and display parts of a molecule including color management
- ° Extraction and visualization of molecular surfaces (van der Waals surfaces and solvent accessible surfaces)
- ° Visualization of back bone
- ° Computation and visualization of secondary structures & Hydrogen bonds
- ° Visualization of additional quantities like scalar or vector fields with color coding
- ° Simultaneous display of multiple molecules
- ° Measuring of lengths and angles in molecules
- ° Sequence and structural alignment of molecules
- ° All molecular visualization tools can be arbitrarily combined with the standard amira modules like volume rendering, slicing, or iso-surfaces

- ° Comfortable self-describing properties, Header contains all meta information (used data types, length, etc. ...)
- ° No maximum file size
- ° Easy network access
- ° Reading and writing a portion of a dataset is possible

- ° netCDF is written in HDF5

`NetCDF`

`NetCDF`

format is platform independant and using the format HDF5. Core libraries for `NetCDF`

access exist in `C++`

, `Fortran`

and `Java`

. An extension of `NetCDF`

for parallel computing called `Parallel-NetCDF`

exists.
More about NetCDF and its usefulness to the project here.
Advantages: - ° Platform independence
- ° Interoperabaility with existing
`NetCDF`

projects - ° Predefined Datatypes for:
- °° General scientific data: coordinate systems, gridded data, radial data, ...
- °° Meteorological data
- °° Trajectories (TrajectoryObsDataset), time series station data

- ° Access for example with the NetCDF-Java Library (predefined open()-, and process()-methods).

Data Format | Libraries | Used by Institution | More information |
---|---|---|---|

Hierarchical Data Format (HDF) | supporting libraries |
National Center for Supercomputing Applications (NCSA) | HDF5 |

IRIS Explorer format | NAG Library | Numerical Algorithms Group (NAG) | import data |

Matrix Market Exchange Formats | BLAS, LINPACK, LAPACK | netlib: Matrix Market | Matrix file formats |

HDF, netCDF, netCDF Operators (NCO) | ARPACK, ATLAS, BLAS, LAPACK, METIS, PBLAS, more... | National Center for Computational Sciences (NCCS) | more |

netCDF | - | National Energy Research Scientific Computing Center (NERSC) | more |

- | CodeLib | (Zuse Institut Berlin) ZIB | more |

HDF | BLAS, LAPACK, FFTs, NAMD | National Renewable Energy Laboratory (NREL) | more |

- | LAPACK | High Performance Center Stuttgart (HLRS) | more |

- | CERNLIB, Physics Analysis Workstation (PAW), ROOT | CERN | - |

J.Craig Venter Institute |

Data Format | Libraries | Used by Institution | More information |
---|---|---|---|

- | Matlab, Molekel, UCSF Chimera, VMD | Swiss National Supercomputing Centre (SNSC) | more |

I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|

EXT | Diagram_TimeSeries | manage | 31 K | 19 Sep 2007 - 12:06 | UnknownUser | TimeSeries implementation of Metamacs |

EXT | Diagram_Trajectory | manage | 59 K | 19 Sep 2007 - 12:05 | UnknownUser | Trajectory implementation of MolTools |

Edit | Attach | Print version | History: r31 < r30 < r29 < r28 | Backlinks | View wiki text | Edit wiki text | More topic actions

Topic revision: r31 - 15 Jun 2012, AntoniaMey