Data Analysis: Algorithms & Methods
Highlights
Vincenzo Innocente (CERN-CMS)
Ed Frank (Univ. of Pennsylvania - BaBar)
CHEP 2000 Highlights from
Session A
1
Contributions
General Architecture 12
Foundation Libraries 3
Detector reconstruction (all but one: tracking!)
Focus on Program Structure 7
Strictly Algorithms 3
Simulation 8
Detector description 4
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
2
Architecture
CHEP 2000 Highlights from
Session A
3
ORCA Software & Architecture
 When project started, most people were worried about ways to
bring on the physicists, develop the sub-detector software etc.
 Important, major emphasis of the last year, but actually less
critical in the long term
 Engineering of the architecture, and crucially the data-handling
issues, are really the critical items
 Tracking algorithms can, and will, be rewritten many times. But
having an architecture that allows and keeps track of plug-andplay is vital.
 Even now we face very large datasets (multi TB). Production,
automation, mirroring, evolution are (some of) the hard issues.
CHEP
2000 Reconstruction
Highlights from
Session A
is much more than the reconstruction code
Vincenzo Innocente
4
Offline Architecture:
New Requirements
Bigger Experiment, higher rate, more data
Larger and dispersed user community performing
non trivial queries against a large event store
 Make best use of new IT technologies
Increased demand of both flexibility and coherence
ability to plug-in new algorithms
ability to run the same algorithms in multiple
environments
guarantees of quality and reproducibility
high-performance user-friendliness
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
5
CMS (offline) Software
Quasi-online
Reconstruction
Environmental data
Slow Control
Online Monitoring
store
Request part
of event
Request part
of event
Event Filter
Objectivity Formatter
Store rec-Obj
Request part
of event
store
Persistent Object Store Manager
Object Database Management System
store
Simulation
G3 and or G4
CHEP 2000 Highlights from
Session A
Store rec-Obj
and calibrations
Data Quality
Calibrations
Group Analysis
Vincenzo Innocente
Request part of event
User Analysis
on demand
6
March 2000 HLT Production Plans
 2M events ORCA reconstructed with high-luminosity pile-up
 2-4 Tera-Bytes in Objectivity/Db
 400 CPU-weeks
 ~6 Production-Units
 ~1-2 Production Units off CERN site
 Copy of all data at CERN in hpss, use of IT/ASD AMSbackend to stage data to ~1TB of disk pools
 Mirroring of Data to a few off-site centers, including transAtlantic
Users want (need!) now what they were promised for 2005..
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
7
Offline Architecture:
Solution
One coherent architecture from online
event filtering to final physics analysis
Clear definition of Clients’ and Services’
interfaces and roles
Framework which orchestrates instances of
all these modules
Set of common foundation libraries
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
8
Analysis
Simulation
Reconstruction
Applications implementing the
physics algorithms.
Triggers
Software Structure
One main framework: GAUDI.
Various specialised frameworks:
visualisation, persistency,
interactivity, simulation (Geant4),
etc.
Frameworks
Toolkits
Basic libraries: STL, CLHEP, etc.
(Vocabulary)
Foundation Libraries
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
9
DØ C++ Framework
 Set of well established interfaces from which
reconstruction and analysis algorithms are built.
 Propagates events through a sets of algorithms in a
well defined and established manner.
 The algorithm configuration and set is determined at
program execution time.
 The framework hides many system related
complexities from the user and the algorithm
developer and allow for sharing of code for common
or related tasks.
Offline Architecture:
Enabling Technologies
C++ & OO
Run Time Dynamic Loading
Event Driven Notification
State Machines
Persistent Object Store
Database Technologies
Networked Client-Server Architectures
Layered Architecture to shield the user
from the above!
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
12
CLEO III
Dynamic Loading vs. Static Linking
 Both equally well supported, can mix.
 Static linking required for reconstruction jobs
need stable environment for long periods of time
 Dynamic Linking/Loading for rapid code development
Fast turn-around time needed
Cutting link times from hours/minutes to minutes/seconds
 Limit the number of libraries to link to:
Proper Layering of code
Separation of data types from the algorithms that supply them
why would I have to link to a tracker to access tracks???
No direct links between objects reduces # of libs to link to
instead we use index-list objects (“Lattice”)
 Run-time cost of resolving symbols is low!
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
13
CMS
Conclusions
 An “implicit invocation” architecture is a flexible software
solution which can scale with the complexity of the CMS
project.
 ODBMS, integrated into the framework,
provides a coherent management of persistent objects
coupled with run-time dynamic-loading, allows to
automatically configure an application
 The framework can effectively shield physics modules from the
underlying technology without penalizing performances
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
14
Component-based
Architecture
NOVA Architecture
nanoDST
GCA Query
Visualisation
Dynamically
loaded apps
Web
browser
Mobile
Analysis
Client
Client
Data Binder
Module
Regional
Center
Analysis
Daemon
Client
Data Binder
Module
Remote Analysis
Remote
Clients
Offline Control
Framework
Bug system
HyperNews MySQL Client
State DB
Database
Navigator
Web Server
Server
Data Binder
Module
CVS Code
Repository
Monitoring
Module
MySQL Analysis
Catalogue
Analysis Server
State
Server
Middleware Components
Grand
Challenge
Architecture
(GCA)
NOVA component
Parameters
Repository
Data Repository
Data Management
CHEP 2000 Highlights from
Session A
Catalog
Interface
MySQL Data
Catalogue
Third party tool customized for
and integrated into NOVA
Application specific; sample
implementation provided
Status:
Implemented
Prototyped
Planned
Existing third party tool employed by NOVA
Vincenzo Innocente
15
Offline Architecture:
Commonalties and Differences
Event Data Reduction
Externally: Pipes&Filters
Internally: Blackboard
CMS: Action on Demand
Lots of
EmcDigis
Lots of
EmcClusters
Lots of
Associations
Track
Associator
Emc
Clustering
Lots of
RecoTracks
External Services (geometry, run conditions etc.)
Mainly procedural
CMS and DØ: “Event” Notification (implicit invocation)
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
16
Offline Architecture:
Commonalties and Differences
Distinction among data, detector and algorithms
Only BaBar makes no clear distinction
Access to object-collections by name
everybody uses named registries (flat or tree)
central component of Gaudi (LHCB) Services
Persistency insulation layer:
Transient copy (managed by the framework)
direct smart pointer
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
17
Principal design choices
 Separation between “data” and “algorithms”
Data objects primarily carry data, have only basic methods
e.g. Tracking hits
Algorithm objects primarily manipulate data
e.g. Track fitter
 Three basic categories of data:
 “event data” (obtained from particle collisions, real or simulated)
 “detector data” (structure, geometry, calibration, alignment, ....)
 “statistical data” (histograms, ....)
 Separation between “transient” and “persistent” data.
Isolate user code from persistency technology .
Different optimisation criteria.
CHEP
2000 Transient
as a bridge between independent representations.
Highlights from
Session A
Vincenzo Innocente
18
Module, event and environment
structure
Lots of
EmcDigis
 Modules provide the algorithms
Lots of
EmcClusters
Lots of
Associations
Track
Associator
Emc
Clustering
 Use existing information to create new objects
Styles range from procedural monoliths to OO castles
 Framework/AC++ provides control & config
Lots of
RecoTracks
Uses TCL scripting, command line
Production executables run 300 modules
 Objects have behaviors, not just values
 “Networks of objects collaborate to provide semantics”
 Internal form of our track objects is irrelevant
 Objects kept in event and environment
 Named access in a flat space
event -> Ifd<EmcCluster>::get(“MergedClusters”)
 Implemented via ProxyDict
Proxies provide complex access when needed
Ensures
physical decoupling
CHEP 2000
Highlights from
Session A
Vincenzo Innocente
19
Algorithms
Data T1
Logical view
Physical view
Data T1
Algorithm
Data T2, T3
A
A
Parent
Data T2
Data T3
Transient
data store
Data T2
Algorithm
B
Data T4
B
C
Data T4
Data T3, T4
Data T5
Algorithm
C
Data T5
• An Algorithm knows only which data (type and name) it uses as input and produces as
CHEP
2000 output.
•Highlights
The onlyfrom
coupling between algorithms is via the data.
Session A
Vincenzo Innocente
21
• The execution order of the sub-algorithms is the responsibility of the parent algorithm.
Action on Demand
Compare the results of
two different track
Rec Hits
reconstruction algorithms
Detector
Element
Hits
Event
Rec T1
T1
CaloCl
T2
CHEP 2000 Highlights from
Session A
Rec
CaloCl
Rec T2
Vincenzo Innocente
Analysis
22
StMaker
GetDataSet()
.maker
StMaker
StMaker
AddData()
.const
.data
.data
.const
1. Init()
2. Make()
CHEP 2000 Highlights from
Session A
“regular” makers communication
Vincenzo Innocente
23
ALICE's choice
Migrate immediately to C++
Immediately abandon PAW
But accept GEANT3.21 (initially)
Adopt the ROOT framework
Not worried of being dependent on ROOT
Much more worried being dependent on G4, Objy....
Allow use of FORTRAN and C++
Allow to start with wrapping and bad design
 Impose a single framework
Provide central support, documentation and distribution
CHEP
2000 - users in the framework
Train
Highlights from
Session A
Vincenzo Innocente
24
Detector Description
CHEP 2000 Highlights from
Session A
25
Detector Data Store
Algorithm
Geant4
Service
G4Converter
G4Converter
G4Converter
Detector Data
Service
DetElement1
DetElement2
Transient
Detector Store
Geant4
Representation
CHEP 2000 Highlights from
Session A
Detector
Persistency
Service
Converter
Converter
Persistent
Detector
Store
DetElement1
DetElement
DetElement
DetElement
DetElement2
Converter
The transient detector store contains a “snapshot” of
the detector data valid for the currently processed event
Vincenzo Innocente
26
J.Bogart
LCD
Input: Why Use XML?
 For 1st pass LCD used ad
hoc file format, one-of-a-kind
code for serial-only parsing
of detector geom.
 XML is a standard metalanguage for defining
markup languages. Good
free parsers exist, more tools
coming.
 XML languages are plaintext, self-documenting.
 Appl. interface to data (XML
document) may be serial or
random-access.
 Avoid growing private file
formats or, worse, hardcoding parameters.
 Make it easy (well, easier)
for several programs to use
same input.
J.Bogart
LCD
Detector Description in XML
Start subdetector
<lcdparm>
description
<global file=“largeParms2.xml” />
<physical_detector topology=“large” id = “L2” >
<volume id=“EM_BARREL” >
<tube>
<barrel_dimensions inner_r = “196.0” outer_z = “322.0” />
<layering n=“40”>
Geometry,
<slice material = “Pb” width = “0.4” />
materials
<slice material = “Tyvek” width = “0.05” />
<slice material = “Polystyrene” width = “0.1” sensitive = “yes” />
</layering>
<segmentation cos_theta = “300” phi = “300” />
</tube>
function
<calorimeter type = “em” />
</volume>
End subdectector
...
description
Detector Reconstruction
CHEP 2000 Highlights from
Session A
29
Track Reconstruction Framework:
Motivation
 We cannot implement the optimal track reconstruction
algorithm right away
There’s probably no one optimal algorithm but several,each optimized
for a specific task
We need a flexible framework for developing and evaluating
algorithms
 The mathematical complexity of track finding/fitting often
limits the number of developers
The involved algebra is often localized in a few places
If we could encapsulate the involved algebra in a few classes
and separate it from the logic of the algorithm it would make
track
finding easier for developers
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
30
Reconstruction Object Model
(BaBar IFR)
 Objects encapsulate the behavior of:




reconstruction information (strip, hit, cluster,…)
the detector model (sector, layer, …)
algorithm strategies (clusterizer, …)
etc.
m cluster
strip
“hit” : 1D-cluster
p cluster
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
31
The BaBar Track Fit
 Written in OO C++
 Integrated with the BaBar software framework
 Exploits a novel formulation of the Kalman equations
Symmetric processing for both track directions
Processing in Parameter and Weight space
reduces the number of matrix inversions required
Fit result is expressed as a Piecewise Helix
Joined helix segments describing ‘most likely’ path through space
 Integrates support other tracking operations
Pattern recognition
Alignment
8 tracks in the commissioning run

Used
to
fit
>10
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
32
Effect Processing
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
33
Code Organization
TrkRecoTrk
1
General BaBar Tracking
5
TrkRep
Kalman Specific
CLHEP
KalMaker
KalRep
KalSite
1
1
PiecewiseTrajectory
KalSite
Inwards and
Outwards
1
2
N
2
A
A
2
Lazy Cache
KalParams
KalWeight
1
N
KalHit
KalBend
HepVector
Helix Trajectory
KalMaterial
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
HepSymMatrix
34
KalStub:
A Pattern Recognition Tool
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
35
Experience with software
development (BaBar IFR)
 Inflexible design was spotted when problems repeatedly occurred
in the same code areas introducing changes
 Applying a more flexible design has usually improved the software
management
more effective development
problems isolation
 A concrete example: computation of number of interaction lengths:
Abstract base class for cluster curve approximation
Path length in the detector model computation has been tested using
a straight line implementation of the curve approximation
Polynomial approximation from a fit in each view was implemented
separately
CHEP 2000 The integration of the two pieces has been immediately successful
Highlights from
Session A
Vincenzo Innocente
36
Simulation
CHEP 2000 Highlights from
Session A
37
Geant4 Capabilities
Very powerful Geant4 kernel
tracking, stacks, geometry, hits, ..
Extensive & transparent physics models
electromagnetic, hadronic, …
extended energy range, new models
Persistency, Visualization, ...
Surpasses Geant-3
in nearly every respect
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
38
Cosmic rays,
jovian electrons
X-Ray Surveys of Asteroids and
Moons
Solar X-rays, e, p
Geant3.21
ITS3.0, EGS4
Courtesy SOHO EIT
Induced X-ray line emission:
indicator of target composition
(~100 mm surface layer)
CHEP 2000 Highlights from
Session A
Geant4
C, N, O line emissions included
ESA Space Environment &
Effects Analysis Section
Vincenzo Innocente
39
Hadronic shower models in Geant4
Typical Example of OO design
Highly structured and layered object model
(inheritance tree):
at each level a given set of functionalities is made
concrete which will be common to a given branch
1st level: calculation of cross-sections and final states for
particles in flight and at rest in a medium.
5th: implement the fragmentation function for string decay
Result in a flexible framework to implement new
hadronic interaction models
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
40
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
41
Changing
cuts
 Results very
stable with
variation of cuts
 even track length
 Also see shower
profiles for
different cuts
(next slide)
between 10mm
and 50 microns
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
42
CMS Geometry Model using
GEANT4
Categories based on responsibilities
Geometry categories:
CMS specific, OSCAR
(Geant4) & Persistent
Hits categories:
CMS & OSCAR
User Interaction categories:
User Actions, GUI
Utilities:
Materials, Rotation Matrices
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
43
ATLAS Accordion Calorimeter
G3: 0.5 Megabytes, 10 seconds*SPECint95/GeV
 STATIC GEOMETRY
110 Megabytes of memory
CPU time is 9.5 seconds*SPECint95/GeV
PARAMETERIZED GEOMETRY
1500 seconds*SPECint95/GeV (1D voxelization)
TAILORED GEOMETRY (G4Accordeon)
8 Megabytes of memory
CPU time is 11.5 seconds*SPECint95/GeV.
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
44
ATLAS Calorimeter
 The first results on EM shower simulations are close to
test beam and GEANT3 results, but more work is needed
to understand the differences.
 GEANT4 performance comparable to that of GEANT3 can
be achieved.
 The design of GEANT4 allows a user to extend GEANT4
functionality. This helps to implement the new idea of
“tailored” geometry description that can be used for high
performance simulation of any calorimeter or other regular
CHEPstructure.
2000 Highlights from
Session A
Vincenzo Innocente
45
TGea
nt3
G3
geometr
y
The Virtual MC
AliMC
Detector Code
CHEP 2000 Highlights from
Session A
TGea
nt4
TFluk
a
G4
geometry
AliRun
G3toG4
Vincenzo Innocente
46
Tracking schema
Inverse Framework
plug-in
FLUKA Step
GUSTEP
AliRun::StepManager
Module Version StepManager
Add the hit
Geant4
StepManager
Disk I/O
Root
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
47
StdHepC++
 There is a strong need for C++ standard Monte Carlo
generator interface.
 StdHepC++ is a natural object-oriented implementation
of such an interface.
 At present we have working examples which integrate
StdHepC++ with the Fortran versions of Herwig, Pythia,
Isajet.
 On the other side, StdHepC++ provides event blocks
readable by MCFast and Geant3, and will have an
interface to Geant4.
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
48
LHC++: what it is (I)
Modular replacement of current CERNLIB for
use in HEP experiments
memory management (C++)
persistency (“I/O”)
mathematical library
foundation classes
random number generators
histogramming
fitting
CHEPsimulation
2000 Highlights from
Session A
Vincenzo Innocente
49
LHC++
Present configuration
Object persistency
from RD45 collaboration (Objectivity/DB)
Foundation classes
HEP specific foundation classes (CLHEP)
Random number generators (CLHEP)
Mathematical library from NAG (NAG_C)
covers broad range of functionality
extensions required by CERN will be added in next
release (Mark 6)
CHEP 2000 quality
assurance
Highlights
from
Session A
Vincenzo Innocente
50
LHC++ Present configuration (cont.)
Simulation: GEANT-4
worldwide collaboration
complete OO design
Histogramming: HTL
Fitting: Gemini, HepFitting packages
interface to any minimizer (at present: NAG, Minuit)
Event generators
Lund people started Pythia-7 (C++)
StdHep++ in process to become part of CLHEP
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
51
LHC++
CHEP 2000 Highlights from
Session A
packages and dependencies
Vincenzo Innocente
52
User requirements
for a physics analysis tool
Easy to use for “end user”
"like PAW”
Foresee customization/integration wrt. existing
frameworks of experiments
e.g., use persistency/messaging/... from experiment
needs to be compatible with experiment’s framework
Plan for extensions
Maximize flexibility/interoperability
"plug-and-play-like" use of components from other
frameworks
(shared libs using the same interfaces)
CHEP 2000
Highlights from
Session A
Vincenzo Innocente
53
Abstract Interfaces for Data Analysis
 AIDA project started by HepVis’99 workgroup:
Abstract Interfaces for Data Analysis
http://wwwinfo.cern.ch/asd/lhc++/AIDA/index.html
 In close collaboration with users and developers from
experiments and providers of other packages
Iguana, HippoDraw, JAS, OpenScientist
 Starting with Histogram classes
presently in final iteration
 Next items are Ntuples, Vectors and Fitting
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
54
Conclusions
In response to the challenges posed by the new
physics program and the expectations of the user
community all major experiments are investing in
flexible and powerful software architectures based
on frameworks (Non just main&subroutines)
Many commonalties
Several qualifying differences
thrust on novel technologies and their impact on physicists
Specialized sub-framework for detector reconstruction,
detector description, physics process simulation
CHEP 2000 Highlights from
Session A
Vincenzo Innocente
55
Conclusions
Experience with OO is no more confined to few
gurus and their prophets
Clear evidence that well engineered OO software is much
easier to adapt, extend, interface in response of evolving
requirements
Near Future ( CHEP 2001?)
Consolidation of current architectures
Common approach to basic computing services
Next Challenge: Customer Satisfaction
Physicists analyzing data from their desk using all the
CHEPpower
2000 they expect from new computing technologies
Highlights from
Session A
Vincenzo Innocente
56
Descargar

Apv6 Tracker Testbeams for 1999