PARMON
A Comprehensive Cluster Monitoring System
PARMON Team
Centre for Development of Advanced Computing,
Bangalore, India
http://www.cdacindia.com
Project Leader: Rajkumar Buyya
([email protected])
Topics of Discussion

PARMON System Model & Architecture







PARMON Server
PARMON Client
PARMON Features and Services
PARMON Installation and its Usage
Monitoring with PARMON
PARMON Integration with other products
Conclusions and Future Directions
2
Motivations




Workstation clusters have off late become a costeffective solution for HPC ? .
C-DAC’s PARAM OpenFrame is a large cluster of more
than 40 Ultra-4 workstations interconnected through
low-latency,
high
bandwidth
communication
networks.
Monitoring such huge systems is a tedious and
challenging task since typical workstations are
designed to work as a standalone system, rather than
a part of workstation clusters.
System administrators require tools to effectively
monitor such huge systems. PARMON provides the
solution to this challenging problem.
3
C-DAC HPCC Software Architecture
APPLICATIONS
SYSTEM
MANAGEMENT
TOOLS
Parallel
File
system
C-PFS
Development Tools
F90 IDE, DIVIA
Languages
C, F77, F90,
Message Passing Interfaces
C-MPI, PVM
Light Weight Protocols
SOLARIS
CLUSTER HARDWARE
4
PARMON - Salient Features




Online creation of Node and Group database
Allows to monitor system activities at Component, Node,
Group, or entire Cluster level monitoring
Designed using state-of-the-art Java technology
Monitoring of System Components :





CPU, Memory, Disk and Network
Allows to monitor multiple instances of the same
componet.
Facility for definition of events and automatic
notification
Miscellaneous facilities : Message broadcast, Invocation
of system management commands (halt, reboot, etc.),
System Information & Configuration
PARMON provides GUI interface for initiating
activities/request and presents results graphically.
5
PARMON System Model
PARMON Client on JVM
PARMON Server
on Solaris Node
parmon
parmond
PARMON
High-Speed
Switch
6
PARMON Implementation

Server




Multithreaded using POSIX and Solaris
Developed using C as it need to access system internals
It is a stateless server
Client





Developed using Java
Java features are extensively used..
New Window is created for each client request, which
interacts with server
Threads are used extensively to while creating online
resource utilization meters
Dynamically configures with changes to node date base.
7
Setting up of PARMON

Server installation & invocation





Binding to port
Rights (requires root permission for full functionality)
parmond or parmond <port-no>
(either at boot time or on-line)
Needs to be loaded on all nodes to be monitored
Client installation & invocation




Java based client (client machine can be PC/workstation
supporting JVM)
CLASSPATH (pointing to classes.zip, parmon.jar)
jar file (parmon.jar)
java parmon or java parmon <port-no>
8
Setting up of PARMON

Server installation & invocation





Binding to port
Rights (requires root permission for full functionality)
parmond or parmond <port-no>
(either at boot time or on-line)
Needs to be loaded on all nodes to be monitored
Client installation & invocation




Java based client (client machine can be PC/workstation
supporting JVM)
CLASSPATH (pointing to classes.zip, parmon.jar)
jar file (parmon.jar)
java parmon or java parmon <port-no>
9
Monitoring System Activities
and Resource Utilization
PARMON Launcher
11
Creation of Node Database
12
Node Deletion
13
Group Creation
14
Group Modification/Deletion
15
Resource Utilization at a Glance
16
Selection of Nodes/Group
17
CPU Usage Monitoring
18
Memory Usage monitoring
19
Disk/Network Usage Monitoring
20
Message Viewer (System logs)
21
Process activities
22
Kernel Data Catalog - CPU
23
Kernel Data Catalog - Memory
24
Kernel Data Catalog - Disk
25
Kernel Data Catalog - Network
26
Catalog of CPU Parameters
27
Component View - Physical
28
Component View - Logical
29
Message Broadcast
30
System Configuration
31
System Information
32
Issuing Commands : halt,
shutdown, etc.
33
Node Diagnostics - Online
(SunVTS)
34
Online Help
35
PARMON Integration with
other Products

PARMON can send resource utilization
information to any other product if
protocols are made available
Node 1
parmond
Node N
PARAM online bulletin board
36
Conclusions and Future
Directions





PARMON successfully used in monitoring PARAM
OpenFrame Supercomputer, which is a cluster of 48
Ultra-4 workstations running SUN-Solaris operating
system.
Portable across platforms supporting Java
Comprehensive monitoring support and GUI
PARMON supports Solaris and Linux clusters and
planned for supporting NT clusters.
Can easily be extended to support web-based
monitoring of clusters, by creating a interface server
(running on web-server) between client and PARMON
server running on cluster nodes.
37
References

Project Team:






Rajkumar Buyya
Krishna Mohan
Bindu Gopal
R. Buyya, PARMON: A Portable and Scalable
Monitoring System for Clusters, International
Journal on Software: Practice & Experience
(SPE), John Wiley & Sons, Inc, USA, June 2000.
Further Info: http://www.buyya.com/parmon
C-DAC: http://www.cdacindia.com
38
Thank YOU
?
39
Descargar

PARMON Cluster Monitoring System