Abstract:
For more than one hundred years, astronomers have had access to large data sets. A typical photographic plate contains a 100 Megabytes of data. But these old analog optical ROMs, accessible only by traveling to a central depository, have been replaced by digital data accessible over a world-wide network from desktop workstations everywhere. New techniques for collecting data, storing and cataloging data, simulating data, and analyzing data are needed to extract a mother lode of scientific insight from the flood of available bits.
Abstract:
An overview will be given of the Bayesian basis of inversion and regularization procedures. The conceptual basis of the maximum entropy method (MEM) will be particularly discussed, and extensions to positive/negative and complex data highlighted. Other deconvolution methods can also be discussed within the Bayesian context, and Wiener filtering, Massive Inference and the Pixon method will be constrasted and compared, with examples from both astronomical and non-astronomical applications.
Abstract:
Redshift surveys of galaxies have proven to be central to progress in many questions of Large Scale Structure and in tests of cosmological models. The surveys of the past decades gave us a feeling for the nature of structure in the Universe, while large ongoing surveys will provide definitive measures of a host of detailed questions concerning the present state of clustering. Progress toward understanding the evolution of galaxy clustering has been slower because of the requirement for large telescopes. To address this important issue, new spectrographs with higher multiplexing capability are nearing completion for the Keck and the VLT telescopes; these new instruments will be used to study galaxy clustering at z=1 with fidelity comparable to existing surveys at z=0. This talk will focus on scientific goals, as well as design and data management issues for these surveys.
Abstract:
The 2dF QSO Redshift Survey now has measured redshifts for 10000 / 25000 QSOs. The status of the survey will be reviewed. The latest results on the evolution of the QSO Luminosity Function and on the evolution of QSO clustering correlation function will be described. We shall also demonstrate the power of the completed survey to place constraints on the QSO power spectrum in both real and redshift space, using mock 2dF QSO catalogues drawn from the massive Hubble Volume simulation of the Virgo Consortium. We shall further describe the potential of the survey to place exciting new limits on the cosmological constant via a geometric test and on the cosmological density parameter via QSO lensing.
Abstract:
We analyze known clusters of galaxies in the 2dF galaxy redshift survey. We consider the reality of Abell, APM and EDCC clusters in 3D space, study the level of contamination by background and foreground structure and derive an estimate for the space density of clusters and the distribution of velocity dispersions. Our result shows that clusters of high velocity dispersion are rare and favor cosmologies with low matter densities. We also consider the spectral indices of galaxies in clusters, filament, the field and nearby dwarfs included in 2dFGRS. We find that the age-metallicty distribution of galaxies in clusters and in the structure associated with them is not distiguishable, suggesting that `nature' rather than `nurture' is responsible for the characteristics of cluster galaxies. Field galaxies have age-metallicity distributions only slightly removed from those of cluster giants whereas dwarfs are radically different and appear to form a different population. Finally, we study the orientation of galaxies with respect to the position angle of galaxies in the filament and we find no evidence for alignments, unlike the case for brightest cluster galaxies.
Abstract:
The Northern Sky Optical Cluster Survey is a project to create an objective catalog of galaxy clusters over the entire high--galactic--latitude Northern sky. We use the object catalogs generated from the Digitized Second Palomar Sky Survey as the basis for this survey. We apply a color criterion to select against field galaxies, and use a simple adaptive kernel technique to create galaxy density maps, combined with the bootstrap technique to generate significance maps, from which density peaks are selected. We present the details of our cluster detection technique, as well as some initial results for two small areas totaling $\sim60$ square degrees. We find a mean surface density of $\sim 1.5$ clusters per square degree, consistent with the detection of richness class 0 and higher clusters to $z\sim0.3$. Follow-up CCD imaging and multiobject spectroscopy of a complete sample is underway, and a 2500 square degree area is in progress. In addition, we demonstrate an effective photometric redshift estimator for our clusters.
Abstract:
The construction and analysis of X-ray cluster surveys to be used for the measurement of large-scale structure statistics is described using the example of the REFLEX Cluster Survey. We explain the sample selection process based on X-ray as well as optical data and discuss various tests to asses the sample completeness. Statistical analyses are controlled by large-scale N-body simulations. Two-point correlation functions and power spectra are applied to describe the spatial distribution of the clusters and to constrain models of cosmic structure formation. We evaluate the slope of the cluster power spectrum on very large scales and the biasing differences for cluster subsamples with different X-ray luminosities using the Fourier and Eigenmodes of the REFLEX data.
Abstract:
We present a joint european X-ray/optical/radio project to study the large scale structure of the universe out to a redshift of one, over a 8x8 sq.deg. area. Using the galaxy clusters/group population to be discovered by XMM, we shall be in position to compute, for the first time, the cluster correlation function in two redshift bins. The 3D maps of clusters and QSOs will be compared with the mass distribution inferred from a weak lensing analysis on the same area and with an associated S-Z survey.
Abstract:
The Roentgensatellite ROSAT has conducted the first all-sky survey in X-rays with an imaging telescope leading to a major increase in sensitivity and source location accuracy. In addition more than 9,000 pointed observations with the position sensitive proportional counter (PSPC) and the high resolution imager (HRI) have been performed. In total more than 150,000 X-ray sources have been detected, raising the previously known number of sources by more than a factor of 20. Among the various object classes observed are comets, stars, white dwarfs, cataclysmic variables, neutron stars, black hole candidates, supernova remnants, nearby galaxies, AGN, and clusters of galaxies. Many of these objects are known to be time variable sources. Various catalogues of sources detected in the ROSAT All-Sky Survey and in the pointed observations have been produced and are used as the basis for identifying and studying the properties of the various ROSAT sources. The talk will present the catalogues and the basic ROSAT data files which can be used to further explore the X-ray sky. In conjunction with data from other wavelength domains the ROSAT data offers the unique possibility in further mining the sky.
Abstract:
The Galaxy Evolution Explorer (GALEX) is a Space Ultraviolet Imaging and Spectroscopic Survey Small Explorer mission that will map the star formation history of the universe over 0<z<2, a key to our understanding of the formation and evolution of galaxies and the origins of stars and heavy elements. It will provide the critical, missing survey of the z=0 UV universe, bridge the gap between the z=0 and redshifted UV universe explored by large optical telescopes, the Hubble Deep Field and NGST, and provide the framework and targets for numerous high-priority (HST, FUSE, AXAF, and SIRTF) investigations. An Associate Investigator Program will provide opportunities to fully exploit the mission and data potential. In addition to an All-Sky survey in two bands (1500 and 2400 Angstroms), GALEX will perform a deep imaging survey of 160 square degrees to m(AB)=26, and spectroscopic surveys over a range of depths and sky coverages. These regions are being selected in coordination with other ongoing and planned surveys at other wavelengths, including the visible, near and far-IR and x-ray. GALEX uses the space ultraviolet to simultaneously measure redshift (using metal lines and the Lyman break), extinction (using the UV spectral slope), and star formation rate (using the UV luminosity which is proportional to the instantaneous star formation rate). Slitless grism spectroscopy is highly efficient, providing 100,000 galaxy spectra in one year. The 50 cm telescope, operating from 1350-3000 Angstroms exploits high-resolution, large-format microchannel-plate detectors and optical coatings to attain the deep, broad-band imaging and spectroscopy required.
Abstract:
Techniques based on Voronoi and Delaunay tessellations for the analysis of discrete spatial datasets yield a promising way to deal with the dichotomy between discrete datasets and interpretation in terms of volume-averaged theoretical predictions based on underlying continuum fields. We desribe the prescriptions to use Voronoi and in particular Delaunay tessellations defined by sample points, be it N-body particles or observed galaxies, to set properly defined density and velocity fields. The successfull reproduction of statistical distributions of nonlinear velocity fields is described, followed by the introduction of the results of Delaunay density field estimation in N-body simulations. Without resorting to artificially defined grids the method manages to trace the density field at an optimal resolution and defined at every point in the sample volume. Very high-density confines of dense dark matter halos are fully resolved, low-density void regions no longer dominated by shot noise and anisotropic filamentary and wall-like features objectively reproduced. We conclude by a discussion of applications of these techniques for analyzing N-body simulations and for analysis of selection affected galaxy catalogues.
Abstract:
Astrophyics provides prime examples of discrete and continuous random fields, such as the distribution of galaxies on the sky, and Cosmic Microwave Background (CMB) maps. The clustering of such fields encodes information on the underlying theory of structure formation, such as the initial conditions, subsequent linear and non-linear gravitational amplification, as well as the physical processes of galaxy formation and can be characterized by their two-point and higher order correlation functions. Recently, new algorithms based on a KD-tree approach revolutionized the measurement of correlation functions, allowing analyses of the future wide field galaxy surveys, as well as megapixel CMB maps. We present preliminary results of the applications of these new techniques to available data.
Abstract:
The theory of cosmic errors allows to estimate analytically uncertainties on statistics measured in galaxy catalogs and to find best estimators. If the cosmic errors are gaussian distributed, it is possible to use a maximum likelhood approach to contrain models of large scale structure by joint measurements of moments of the count-in-cell distribution function.
Abstract:
In the standard picture of structure formation, initially random-phase fluctuations are amplified by non-linear gravitational instability to produce a final distribution of mass which is highly non-Gaussian and has highly coupled Fourier phases. Second-order statistics, such as the power spectrum, are blind to this kind of phase association. I discuss the information contained in the phases cosmological density fluctuations and their possible use in statistical analysis tools. In particular, I show how the bispectrum measures a particular form of phase association called quadratic phase coupling, show how to visualise phase association using colour models. These techniques offer the prospect of more complete tests of initial non-Gaussianity than those available at present.
Abstract:
A method to handle nonlinear effects in a likelihood analysis of peculiar velocities is described. A principal-component analysis allows an evaluation of goodness-of-fit. New constrained simulations with galaxy identifications (GIF) allow proper mock catalogs for testing these methods. The results are revised values for the cosmological parameters (e.g. $\Omega_m = 0.35 \pm 0.1$ for LCDM), closer to the values obtained by other data and methods. There is an indication for missing power (``cold flow") on scales corresponding to the weak second peak in the CMB power spectrum.
Abstract:
Minkowski functionals constitute a complete family of morphometric descriptors. Beyond the well-known scalar Minkowski functionals, vector- and tensor-valued generalizations allow for a directional analysis which is also sensitive to symmetry. Thus, they are well-suited for quanitfying the substructure of galaxy clusters and for testing the isotropy and homogeneity of large-scale structure. - In my talk I will give a short introduction into generalized Minkowski functionals and review recent applications on galaxy clusters.
Abstract:
We review recent methods for compression and classification of galaxy images and spectra in large data sets such as 2dF and SDSS. We describe supervised and unsupervised methods such as Principal Component Analysis , Fisher Matrix, Artificial Neural Networks and Information Bottleneck. We discuss how these methods can be used to study physical processes of galaxy formation, clustering and galaxy biasing in the new big redshift surveys.
Abstract:
A general approach to wavelets is presented within a framework of a separable functional Hilbert space H. Basic tool is the construction of H-product kernels by use of Fourier analysis with respect to an orthonormal basis in H. Wavelets are shown to be 'building blocks' that decorrelate the data. A pyramid scheme provides fast computation. Regularization in inverse problems is formulated as multiresolution analysis. The method of selective multiscale approximation is outlined in case of noisy data.
Abstract:
This talk is about the algorithmic challenges involved in allowing astrophysicists to continue using the modeling and inference tools they've been happily applying to kilobytes of data, when they start drawing in gigabytes of data. I will try to give a roadmap to the various literatures and tools from computational geometry, numerical analysis, databases, data mining and artificial intelligence. The technical meat of the talk will highlight some new algorithms and data structures that fall into the class of "cached sufficient statistics." These are summary data structures that live between the statistical algorithm and the database, intercepting the kinds of operations that have the potential to eat up valuable time if they were answered by direct reading of the dataset. Some structures may be familiar (kd-trees and R-trees, for example) while some are new (All-dimensions trees, and the Anchors Hierarchy for high dimensions), but for all stuctures we introduce new search algorithms operating on the cached structures that have interesting properties which call for further development. I will give some computer demonstrations showing various classes of accelerations broadly covering kernel density speedups (1000 fold), clustering speedups (1000-10000 fold), anomaly detection (100-fold) and 2-, 3-, 4- and 5-point correlation function computation (100-fold up to about a trillion-fold). We will also show how this can be applied to other problems in spatial filament and sheet identification, probabilistic red-shift modeling and fast high-dimensional wavelet transforms. In collaboration with: Alex Gray, Scott Davies, Dan Pelleg, Mary Soon Lee, Remi Munos, Jeff Schneider, Bob Nichol, Andy Connolly (U Pitt), Alex Szalay (JHU). Related papers and information: www.cs.cmu.edu/~AUTON
Abstract:
A major source of data in astronomy in the coming years will be from large, uniform, digital sky surveys covering a range of wavelengths, with pixel information content measured in Terabytes (or even Petabytes). Even the already existing surveys are detecting ~ 10^8 - 10^9 sources with ~ 10 - 100 parameters per source per band. One of the principal tasks of the virtual observatories which will be used to analyse such data in the future will be to combine multiple surveys, e.g., from different wavelengths. Thus, we will have an enormous incrase both in the data volume and data complexity. Novel data mining (or KDD) techniques must be introduced in order to fully exploit the scientific potential of these enormous data sets. This will include supervised and unsupervised clustering and classification methods which can partition the data into objectively defined subclasses of objects (e.g., stars, galaxies, quasars, etc.), and which may lead to the discoveries of previously unknown types of objects, selected as outliers in some data parameter space. Even more challenging problems are in automated pattern recognition in the image domain; simple examples include an automated morphological classification of galaxies, searches for gravitational lenses (arcs), etc. We will illustrate some initial experiments along these lines and discuss prospects for the future work.
Abstract:
With the new generation of galaxy redshift surveys we have the opportunity to understand the clustering and physical properties of galaxies as a function of their spectral type. For example, the Sloan Digital Sky Survey (SDSS) will produce in a sample of 1,000,000 galaxy spectra with a median redshift of z=0.15 with a spectral resolution of 3A. We present here a series of techniques for automated spectral classification that are being developed for the SDSS survey. These range from a hierarchical Karhunen-Loeve approach to classification techniques built on wavelet decomposition. We show how the classification of the existing SDSS spectra correlation with the physical properties of the galaxies and discuss the future direction of these classification approaches.
Abstract:
The 2dF galaxy redshift survey has already yielded over 100,000 spectra (out of the planned 250,000). We present a method for the automated spectral classification of 2dF and other large redshift surveys using a new continuous parameter, based on a Principal Component Analysis. This parameter is designed to be robust to instrumental uncertainties and reflects the absorption/emission strength of a galaxy. Relationships between this parameter and physical properties of galaxies are also investigated.
Abstract:
The Hamburg/ESO survey (HES) is an objective prism survey covering the total southern ($\delta < +2.5 deg$) extragalactic ($|b| > 30 deg$) sky in the magnitude range $12 < B < 17.5$. It's main aim is to find bright quasars. However, at it's spectral resolution of $15$\,{\AA} at H$\gamma$, many stellar absorption lines are detectable, so that the HES data base of $\sim 4$ million digital spectra is a valuable source for many different types of interesting stellar objects, e.g., extremely metal-poor stars, dwarf and giant carbon stars, or white dwarfs. We will present methods we use in the HES for selecting such stars; namely, Bayes' rule and minimum cost rule classification. We will discuss the advantages of these methods as compared to other techniques of automatic spectral classification, and we will present first results from follow-up observations.
Abstract:
We present progress results of a new survey for bright QSOs (V<14.5, R<15.4, B_J<15.2) covering the whole sky at |b|>30, based on the USNO, GSC, DSS and RASS datasets. The surface density of QSOs brighter than B_J=14.8 turns out to be 2.9 +- 0.8 10^{-3} deg^{-2}. The optical Luminosity Function at 0.04 < z < 0.3 shows significant departures from the standard pure luminosity evolution, providing new insight into the modelling of the QSO phenomenon.
Abstract:
A general Bayesian algorithm for the reconstruction of the large scale structure from sparse and noisy data with partial sky coverage is presented. The algorithm consists of the following steps: a. Maximum likelihood analysis to estimate the cosmological and model parameters given the data. b. Principal component analysis is used to perform a goodness-of-fit analysis of the most probable model. c. A Wiener filter is applied to the data to get an estimator of the underlying density and velocity fields. d. Constrained realizations are used to probe the scatter around the Wiener filter reconstructed fields. This algorithm is applied here to the MARK III, SFI and ENEAR radial velocity surveys.
Abstract:
In most imaging surveys, detecting sources and classifying them in an automatic way may be seen as the very first stage of the data-mining process. After presenting the general aspects of image detection and classification in the astronomical context, I review the different techniques used in current projects. Emphasis in put on promising developments such as artificial neural networks and vision models.
Abstract:
A method for identifying stars, galaxies and quasars in multi-color surveys is presented. It uses a library of >65000 color templates for classification and redshift estimation of extragalactic objects. After applying it to the object list of the Calar Alto Deep Imaging Survey (CADIS) the results were checked by spectroscopic identifications and showed virtually no classification mistake at magnitudes of R < 22. For optimization of future surveys, the performance of three model surveys with different broad or medium-band filters but identical telescope time consumption were investigated. We found medium-band surveys to be equivalent or superior to broad-band surveys in terms of classification and redshift estimation although the individual filters go considerably less deep. This conclusion is discussed by means of simulations as well as analytic arguments, while considering both, ideal survey conditions and practical limitations.
Abstract:
We present a new package based on neural networks (NExt or Neural Extractor) which seems capable to solve most of the problems raised by the use of traditional packages in object detection, object deblending and star/galaxy classification. The most relevant aspects of NExt are: i) NExt does not make any a-priori assumption on what “an object is” but just assumes the minimal definition of two adiacent connected pixels; ii) NExt does not require any fine tuning of the detection and classification parameters; iii) to perform star/galaxy classification NExt does not use any arbitrarily defined set of features but rather it selects on objective grounds the most significant ones. iv) NExt includes a deblending loop which effectively deals with multiple objects even at the completeness limit of the raw data. Extensive testing shows that NExt i) performs better than any available software (much smaller number of spurious detections and at least equivalent completeness); ii) is fully modular and therefore can be included in any data reduction pipeline; iii) it is suitable to process large data sets. Planned developments are the implementation of neural modules for the automatic detection of elongated tracks (space debris, asteroids, etc.), for the detection of strongly lensed objects and for intelligent data mining in the catalogue space (matching algorithms based on fuzzy logic, search for peculiar and rare objects, etc.).
Abstract:
Periodicity analysis of unevenly collected data is a relevant issue in several scientific fields. In astrophysics, for example, we have to find the fundamental period of light or radial velocity curves which are unevenly sampled observations of stars. Classical spectral analysis methods are unsatisfactory to solve the problem. In this paper we present a neural network based estimator system which performs well the frequency extraction in unevenly sampled signals. It uses an unsupervised Hebbian nonlinear neural algorithm to extract from the signal the principal components which, in turn, are used by the MUSIC frequency estimator algorithm to extract the frequencies. The neural estimator is tolerant to noise and works well also with few points in the sequence. We benchmark the system on real signals with the Periodogram, the DCDFT and ESPRIT. Finally, we present some experimental results of the application of the neural estimator to cyclostratigrapy, regarding the detection of Earth orbital (Milankovic') periodicities recorded in Cretaceous shallow water carbonate sequences outcropping in Southern Apennines (Italy).
Abstract:
I will review the current status of the methodology for CMB data analysis and the scientific interpretation of experimental results with particular emphasis on problems related to forthcoming large data sets to be produced by the satellite missions MAP and Planck. Some of the subjects to be discussed include (1) power spectrum estimation, related problems with the application of conventional algorithms, and the necessity for new methods which would scale sufficiently well (CPU and memory) to permit succesful application to MAP and Planck data, (2) formal issues of map making, noise estimation, and component separation in relation to instrumental design and scanning strategy, and (3) intercomparison of heterogeneous data sets.
Abstract:
Realizing the extraordinary scientific potential of the Cosmic Microwave Background requires precise measurements of its tiny anisotropies over a significant fraction of the sky at very high resolution. However, the analysis of the resulting datasets presents a serious computational challenge. Brute force application of existing algorithms would require terabytes of memory and hundreds of years of CPU time. We must therefore both maximize our resources by moving to supercomputers and minimize our requirements by algorithmic development. Here we will outline the nature of the challenge, present our current optimal algorithm, discuss its implementation as the MADCAP software package, and demonstrate its application to real data.
Abstract:
MAXIMA is a baloon-borne experiment mapping the sky in three frequency bands (150, 240 and 410GHz) of the microwave range (Lee et al 1998) with a 10 arcminutes resolution (FWHM). To date the instrument was flown twice in August 1998 and June 1999. The analysis of the data from the first flight has been recently completed (Hanany et al 2000) resulting in a high resolution, high signal-to-noise ratio map of the 120 deg patch of the sky and the stringent constraints on the CMB anisotropy power spectrum down to angular scales of 10 arcminutes. In this talk I will briefly summarize the major problems encountered in the analysis of the complex real-life CMB data such as the MAXIMA data set as well as the analysis methods we have developed and implemented to tackle these problems in order to extract the cosmologically valuable information. In particular I will describe the practical algorithms for the noise estimation and map making (Stompor et al 2000). The purpose of this talk is twofold: it aims at demonstrating the robustness of the recent results of MAXIMA and at the same time is to provide an outline of some of the state-of-the-art CMB data analysis method applied to the MAXIMA data sets.
Abstract:
I will present a method for jointly estimating the noise and the signal in the time-stream of a scanning experiment. I will then show the results of its implementation on the BOOMERanG98 data.
Abstract:
Point source removal poses a difficult problem in analysis of Planck data, since each source has a unique frequency spectrum and so standard multifrequency algorithms fail. Nevertheless, by combining the use of the continuous wavelet transform and a maximum-entropy algorithm, simulations suggest that it is in fact possible to reduce point source contamination of the CMB signal to an acceptably low level. On the one hand, the maximum entropy method is able to deal with faint point sources as an extra `noise' contribution. On the other hand, a technique based on the Mexican Hat wavelet transform can identify and subtract the brightest point sources. Therefore both methods complement each other. As a by-product this combined technique also allows the construction of point source catalogues at each of the Planck observing frequencies.
Abstract:
It is becoming increasingly clear that it is an enormous challenge to interpret data from high resolution cosmic microwave background experiments. In the case of the BOOMERANG mission our understanding of early Universe cosmology is already not limited any more by the availability of cosmic microwave background data but by our ability to analyse it. These problems will be even more acute for future experiments. The key difficulty lies in the scaling of the computational complexity of evaluating the likelihood functional, which increases as the cube of the number of pixels in the map. I will describe new developments and their impact on experimental design which reduce this scaling and make the analysis task feasible - exploiting the theoretician's toolbox in order to deal with experimental realities such as correlated receiver noise and partial sky coverage.
Abstract:
The Microwave Anisotropy Probe (MAP) is a NASA Mid-Class Explorer (MIDEX) mission that will launch in the Spring of 2001. MAP will observe the cosmic microwave background anisotropy over the full sky at 0.21 degree angular resolution. Detailed measurements of the anisotropy to this angular scale will provide a wealth of information about the physics of the early universe. After a brief overview of the MAP mission I will describe the data processing and analysis plans we have with an emphasis on the calibration and map-making procedure, and the method we plan to use to estimate the angular power spectrum of the map. The MAP hardware is produced by the Goddard Space Flight Center in partnership with Princeton University. Additional science team members are at Brown U., NRAO, the U. of British Columbia, U. of Chicago, and UCLA.
Abstract:
ESA's satellite Planck will image the microwave sky with unprecedented sensitivity from 30 to 850 GHz. Within the IDIS (Integrated Data Information System) project we have started to develop the data model for Planck Time Ordered Data and Planck Full Sky Maps and implementing it using a three-tier architecture which allows complete separation between data storage and processing (i.e. the data users need not to be aware of the storage means or location). This Data Management Component is already being used for simulation activities and the modeling of some foreground components. We have ingested several Galactic surveys into the database and used the "scientific" data access interface to process the data. The Full Sky Map data structure utilises the HEALPix scheme of pixelisation of the sphere. We have been able to obtain consistent measures of the angular power spectrum of the Galactic radio continuum emission between 408 and 2417 GHz right away. Galactic polarized emissions have also been analysed with the above tools. Simulated Time Ordered Data of polarized signals at different frequencies can thus be obtained by convolving simulated skys of CMB and Galactic foreground with the polarized beam patterns.
Abstract:
The advent of large memory parallel supercomputers and adaptive resolution numerical algorithms permits 3D simulations in astrophysics and cosmology of increasing realism. Analyzing the results of the largest feasible simulations is challenging due to the enormity of the 4D data sets (3D + time)-typically terabytes per run-and their multivariate complexity. In addition, adaptive resolution algorithms, such as adaptive mesh refinement, produce multi-scale data requiring new approaches to data analysis and visualization. In this talk, I review the state-of-the-art in adaptive resolution numerical algorithms and associated data analysis techniques. Simulations are commonly analyzed in two powerful but complementary ways: in the physical frame (3D analysis) and in the observer's frame (2D analysis). The latter requires synthetic images and spectra be constructed as an intermediate step. We refer to this as numerical observations. We describe our pilot effort to build a Numerical Observatory on the World Wide Web for accessing and analysing the results of high resolution cosmological simulations.
Abstract:
Many decades ago a search for variable stars was one of the main areas of astrophysical research. Such searches, conducted with CCD detectors rather than with photographic plates, became a by-product of several projects searching for gravitational microlensing events towards the Magellanic Clouds and the Galactic Bulge: DUO, EROS, MACHO, and OGLE. These searches demonstrated that is is possible and practical to process in near real time photometry of tens of millions of stars every night, and to discover hundreds of thousands of variable stars. A limited subset of new variable star catalogs was published, but no comprehensive database of all photometric results became public domain so far. In the last few years a much broader, but shallower searches were undertaken, and many other are at various stages of development or planning. There is a need to develop a system that would allow all these data to be processed and to be posted on the Internet in real time. The data set related to variability of point sources is made of a relatively few data types, hence it may be relatively easy to develop. Yet, it may be diverse enough to be interesting to a large number of users, professional as well as amateur, making it possible to do real time virtual observing, as well as data mining.
Abstract:
The MACHO search for micorlensing of LMC and Bulge stars by compact objects in the galaxy started in 1992 and concluded in December 1999. The accumulated database of widefield CCD imaging and photometry now contains 1.45 Terabytes of pixel data and the lightcurves of 11.9 million LMC stars representing more than one billion individual photometric measurements. The systematic search for time varing signals in this database with the characteristics of microlensing events has consumed the entire 25+ person science team for a good fraction of the 8 year project duration. In this review, I will outline the evolution of our thinking with respect to signal detection, background estimation, calculation of efficiency, data processing, data management and data mining. In particular I will focus on the lessons learned and give guidlines for furture large scale searches for microlensing and time-variable events detection experiments in general.
Abstract:
The FORS Deep Field (FDF) is an enterprise to obtain a deep field from ground based observations with the VLT on a field that is comparable to the HDF in depth but exceeds the area by more than an order of magnitude. Most of the imaging observations have been completed and spectroscopic follow-up observations have started by now. The main results of the photometric studies will be described. Since the observations have been obtained over a period of more than one year, they can also be investigated for temporal variations. We will present the results on photometric variability and discuss them in the context of supernova searches, GRB counterparts, AGN variability and derive limits on the signatures expected from known populations of variable sources in deep photometric investigations.
Abstract:
Both the ROSAT All-Sky Survey as well as many pointed observations have produced large samples of previously unidentified X-ray sources, both inside and outside of star forming regions. With optical and infrared follow-up observations of ROSAT Survey sources, several hundred new low-mass late-type pre-main sequence (T Tauri) stars were found, inside and around all known star forming regions as well as along the Gould Belt, most of which have spectral types G and K. With deep pointed observations, one can find fainter, low-mass objects: With follow-up observations of X-ray sources (38 ksec PSPC pointing) in the Chamaeleon I dark cloud, a site of on-going intermediate- and low-mass star formation, several brown dwarfs with 40 to 80 Jupiter masses were found, the first X-ray detected brown dwarfs, having spectral types M6 to M8. Both these samples, young nearby stars and young brown dwarfs, are well suited for direct imaging searches for extra-solar planets as companions, because (i) such substellar companions are still relatively bright when young, (ii) the primary objects are nearby, and (iii) brown dwarfs are faint themselves, so that the dynamic range is not that large.
Abstract:
Gamma-ray bursts provide what is probably one of the messiest of all astrophysical data sets. Burst class properties are indistinct, as overlapping characteristics of individual bursts are convolved with effects of instrumental and sampling biases. Despite these complexities, data mining techniques have allowed new insights to be made about gamma-ray burst data. We demonstrate how data mining techniques have simultaneously allowed us to learn about gamma-ray burst detectors and data collection, cosmological effects in burst data, and properties of burst subclasses. We discuss the exciting future of this field, and the web-based tool we are developing with support from the NASA AISR Program to allow others to join us in gamma-ray burst classification.
Abstract:
We present the results of our method to ``mine'' the blazar sky, i.e., select blazar candidates with very high efficiency. This is based on the cross-correlation between public radio and X-ray catalogs and has resulted in two surveys, the Deep X-ray Radio Blazar Survey (DXRBS) and the ``Sedentary'' BL Lac survey. We show that data mining is vital to select sizeable, deep samples of these rare active galactic nuclei.
Abstract:
The Brera Multiscale Wavelet (BMW) catalog is now completed and publicly available. It is based on an algorithm developed for multi-scale detection and analysis of all the public ROSAT/HRI fields. We describe the procedure of source detection and characterization and the general properties of the catalog, such as sensitivity, sky coverage and correlations with other optical/IR/Radio catalogs. First applications are the search for variable sources (e.g. 1BMWJ080622.8+152732 = RX J0806.3+1527; Israel et al. 2000) and a survey of distant clusters of galaxies, based on the extent parameter provided by the catalog.
Abstract:
Plate archives contain hundreds of thousands of individual images, taken from the beginning of last century up to the present. The Sonneberg Plate Archive, a collection of some 270,000 plates, is in the process of being digitized. First investigations on the basis of selected fields in the Orion/Taurus/Auriga region have been conducted. 300 more or less randomly chosen stars were examined for variability on about 500 plates taken between 1960 and 1996. Although the intrinsic photometric accuracy of individual data points is only in the order of 0.2 mag, the findings were a surprise: 1. It turned out that significantly more than 50 percent of the stars have to be regarded as variable. A lot of HIPPARCOS-constant stars reveal variability on long time scales. 2. Among the stars investigated, new types of variability have been detected. We found irregular stars, objects with cyclic variations with peroids of some thousand days and several tens of mag amplitude, and stars of slowly increasing and decreasing brightness of a few hundredths of magnitude over decades. The talk gives a brief overview about the potential of digitized plate archives from the aspect of detecting and investigating long-term variablity and shows first results.
Abstract:
The recent availability of wide-field imagers has enabled the execution of customized digital imaging surveys with characteristics that are rapidly making them major sources of astronomical data for observations with 8m-class telescopes. Foreseeing this need for the VLT, the ESO Imaging Survey (EIS) program was established to carry out public imaging surveys for the ESO community. The program has provided a steady stream of data from which samples can be drawn for observations with the broad array of instruments foreseen for the VLT. EIS has also served as a staging phase for the development of suitable survey pipeline software, tools and database. This talk reviews the goals, framework and the technical/scientific achievements of the program as well as its current status and future perspectives.
Abstract:
No Abstract.
Abstract:
We present the basics and first results of an algorithm developed to find candidate LSB galaxies on the DPOSS plates, in the framework of the CRoNaRio collaboration. The method uses a convolution with a series of compensated-profile exponential filters to enhance features that are extremely faint on the original plate scans. The filtering is applied to the images of the scans after a pre-cleaning with standard reduction software (SExtractor). The search has been done on both the J (blue) and F (red) plates, with special emphasys on the second. Initial results from first CCD-based optical follow-ups, performed with the ESO 3.6m telescope at La Silla and the Telescopio Nazionale Galileo at La Palma - Canary Islands, are presented as well.
Abstract:
The Calar Alto Deep Imaging Survey (CADIS) is a combination of a medium deep (R~23.5mag) multi colour survey and a deep emission line survey. The multi colour survey consists of three (B, R, J or K) broad band filters and up to 13 medium band filter with a resolution of ~40. The emission line survey is done in three wavelength intervals, each of which is probed with five settings of an imaging Fabry-Perot interferometer. The main objectives of CADIS are twofold: - a statistical analysis of different object classes like stars and galaxies - the extraction of individual rare objects like low mass stars, extremely red objects and candidates for high-redshift (z~4.7-6.5) candidates. In this contribution we outline the methods and implementations developped in CADIS to transform the raw data coming in a monthly rate from the telescopes into the information on individual objects to be used for further astrophysical investigations. The main steps of this data flow system e.g. standard reduction, object extraction and photometry, the combination of object lists from the ~30 different wavelengths into a masterlist and the classification of objects are presented and discussed in detail. Finally, as a prove for the capabilities of our system, we give scientific results extracted from the CADIS data base. We show the K-band our number counts, the galaxy luminosity function in the range 0.3<z<1.1 and candidates for z=5.7 Lymann-alpha galaxies.
Abstract:
Access to the "virtual sky" -- the federation of astronomical data archives, object catalogs, and associated information services -- requires a common framework for requesting, retrieving, and manipulating information from diverse, distributed resources. We must make it possible to seamlessly integrate data from the new all-sky surveys, enabling cross-correlations between multi-Terabyte catalogs and providing transparent access to the underlying image or spectral data. Success requires high performance computational systems, high bandwidth network services, and agreed upon standards for the exchange of metadata.
Abstract:
We will present in this talk the application of new statistical analyses to large, multi-dimensional databases. This work is part of a new initiative at Carnegie Mellon/Univ. of Pittsburgh which is a multi-disciplinary collaboration between astrophysicists, statisticians and computer scientists. We will discuss specific examples we are working on using the Sloan Digital Sky Survey data which include clustering analyses (to optimally represent the large scale structure in the universe and find clusters of galaxies), spectral classification (choosing the best basis for these astronomical data), fast n-point correlation functions and anomaly searches in multi-dimensional space (use of Bayes Nets and Mixture Models). These techniques can be extended to other data sources and could be used as part of any ``toolkit'' for the Virtual Observatory.
Abstract:
I will describe a pixon image reconstruction algorithm and its application to the problem of extracting galaxy clusters from multiwavelength CMB data. Results will be shown for a simulated Planck surveyor observation.
Abstract:
The spectral distribution of celestial objects carries out essential information on the physical processes which take place in them. From multispectral images obtained through colored filters, each pixel may be classified. This classification suffers of a drawback, the mixture: each pixel value can be considered as the result of a combination of different sources. A linear mixing was considered as a valid model for such studies. We examined the ability of Blind Source Separation (BSS) methods to display interesting features, which could help to improve the description of celestial objects. Karhunen-Loeve expansion (KLE) constitutes the oldest approach. The cross correlation matrix of the images is first computed and a singular value decomposition is done. The sources are computed as the weighted means of the images taking into account the eigenvectors. In case of Gaussian Probability Density Functions (PDF) uncorrelated data are equivalent to independent ones, and KLE is sufficient to separate sources. But the PDFs are generally not Gaussian and a set of BSS methods were developed. Second order BSS algorithms are based on the hypothesis of spatially correlated sources. The cross correlation between shifted sources taken two by two is decreased, while the correlation between themselves is increased. SOBI is an efficient such algorithm (1). We adapted the algorithm from 1D to 2D field, and we applied it with different variants. Other algorithms minimize or maximize contrast functions (2) based on higher order cumulants like in JADE (3). With FastICA non Gaussianity is measured by a fixed-point algorithm using an approximation of negentropy through a neural network (4). Many experiments were done with KLE, SOBI, JADE and FastICA algorithms. Different tools were applied in order to select the best decomposition. In particular, we introduced a test based on the mutual information between the sources. We tested these tools on WFPC2 HST images of the Seyfert radiogalaxy 3C120. As the noise is not stationary, a generalized Anscombe transform stabilized the variance (5). The source decomposition from the selected BSSs does not depend on the methods a lot. BSS algorithms extracted sources which seemed correspond to independent physical components. We noted that SOBI, based on correlations in a large region around a pixel, carried out quite the same decomposition as BSS related to local higher order statistics. That fact brings some confidence in the resulting decomposition. Even if the linear model is not fully realistic for processing celestial images, the application of different algorithms gives quite similar decompositions. From a physical insight, the resulting sources correspond mainly to real phenomena. Then, BSS can be considered as an interesting exploratory tool, which can suggest a new analytical process to the user. 1. Belouchrani A., Abed-Meraim K., Cardoso J.F., Moulines E., 1997, IEEE Trans. SP 45 p. 434 2. Comon P., 1994, Signal Processing 36, p.287 3. Cardoso J.F., Souloumiac A., 1993, IEE Proceedings-F 40 p. 362 4. Hyvärinen A., 1999, IEEE Transactions on Neural Networks 10 p.626 5. Murtagh F., Starck J.L., Bijaoui A., 1995, Astron. Astroph. Sup. Ser. 112, 179
Abstract:
The talk will present an overview of the role of databases in Astronomy. Given the exponential growth in our data collection capabilities we are facing an immediate data avalanche. The usual techniques of storing and accessing data do not scale into this regime. Astronomers should make use of recent advances in large scalable database technology. An overview of current dabase technologies will be presented, with a special emphasis on indexing and query techniques, in particular where spatial information is involved. Scalability issues will also be considered. An high level overview of the SDSS archive will be presented as a case study.
Abstract:
The SDSS Science Archive (SX) is one of the first of a new generation of multi-Tb astronomical archives. The SX was designed to enable scientific data mining on the terabyte scale as well as interactive data exploration. In order to make the data most efficiently accessible to the SDSS community, the master archive is replicated at several mirror sites (local archives). Each archive site consists of a distributed object-oriented database that is accessible via a client-server interface. The SX client is a lightweight Tcl/Tk GUI that can be run on any platform (including laptops). SDSS queries are formulated in SXQL, which supports the basic syntactical elements of SQL along with some object-oriented extensions and several astronomy-related language macros. The SX server combines a fully multithreaded query engine with a distributed parallel architecture, splitting the data among multiple hosts and allowing for parallel, scalable I/O and parallel data analysis. Each query is parsed into a query execution tree, each of whose nodes is executed as a separate thread. Nodes that access data on remote hosts (partitions) are executed by remote slave servers that fetch the data locally. This distributed and multithreaded design allows query execution to be optimized and dynamically load-balanced for any type of multi-processor architecture, from SMP machines to Beowulf-type clusters. The server also uses two fast indices - a Hierarchical Triangular Mesh quad-tree spatial index (see Kunszt et al.), and a multi-dimensional k-d tree flux index - to analyze and determine the a priori cost of the query prior to executing it. This greatly facilitates data mining in such a large archive.
Abstract:
The Hierarchical Triangular Mesh (HTM) is a partitioning scheme for the surface of the unit sphere. The pole issues of the spherical coordinate mappings are completely resolved with this scheme, there are no singularities involved. The HTM is simple enough to make a very fast algorithm possible, extending its capabilities further. At sufficient deep levels it can be used as a unique object identifier and at higher levels to cluster objects together. This scheme has been used successfully in some of the latest large-area astronomical databases like 2MASS, GSC-II and SDSS. The HTM can be used to ease the cross-matching of objects across large datasets. We also define a geometric querying mechanism that allows fast lookups of HTM identifiers given an arbitrary area on the sphere.
Abstract:
Mission data sets are increasing in size and complexity. To enable efficient processing and storage of such data sets a tessalation scheme may be adopted. The Hierarchical Triangular Mesh (HTM) and the Hierarchical Equal Area isoLatitude Pixelisation (HEALPix) schemes are two popular schemes each with software tools available to exploit their particular properties. The initial drivers behind these two schemes are quite different but as their use continues the complimentary facilities of each package will become more relevant. In this presentation we describe the motivation and facilities of HEALPix and HTM, we ask if it is possible to combine these schemes or make use of them in a combined manner and say why we may wish to do this.
Abstract:
The ability to extract knowledge from very large data catalogues resulting from sky surveys or from deep wide fields, observed at various wavelength ranges, is a major challenge of our science, in the context of the future virtual observatories. Building on the expertise of the Strasbourg astronomical Data Center (CDS) in the domain of data management and validation, several studies have been conducted recently by the scientific teams in Strasbourg, specifically in the domain of stellar populations and galactic structure. We will try to draw some lessons of these works for the strategies to be adopted when mining very large data sets. In a first part, we will discuss some of the statistical methods and database tools which can best serve as a basis for cross-wavelength comparisons, and we will stress the need for reference databases of images and point sources. In a second part we will explore some of the statistical methodologies which can be used for successfully mining very large catalogues: classification, indexation, inverse methods, etc. We will illustrate the discussion with examples of studies based on the Tycho survey of 1 million brightest stars (recently extended to 2.4 million under the name of Tycho-2, by Hoeg et al. 2000) and on the near-infrared DENIS survey of the southern sky (Epchtein et al. 1999). Cross-matching optical surveys with X-ray and infrared data allows to obtain a new view of the stellar populations of our Galaxy and of its structure at a larger scale, for instance through the characterization of the stellar age, or through the mapping of the interstellar medium.
Abstract:
Besides the large surveys well suited for data mining, a very large number of more heterogeneous catalogues are available and are potentially useful for knowledge discovery, especially when possibly variable phenomenae are surveyed. The presentation will focus on the homogeneisation of metadata, which is a required step for finding the interesting dataset in the large collections currently accessible. Some typical applications and illustrative examples will be presented.
Abstract:
The astronomical records in ancient Chinese documents have been checked for identifying the youngest ROSAT x-ray sources with historical SN events. A SN event recorded in inscription on bones or tortoise shells in Yin dynasty (14-11th century BC) is suggested to be identified with the ROSAT source J1714.2-3939. In the inscription it was written:"On the Jisi day, the 7th day in the month, a big new star appeared in the company of Huo star". The word Huo in ancient Chinese astronomical documents stands for the star Antares or the constellation around the Antares. It seems that the SN could be considered as a company with the Antares, a star 17 degree apart. The x-ray observations show the temperature of the source J1714.2-3939 shows the age is about 3000-4500yrs. Therefore both the age and the position of J1914.2-3939 accord with the record in the inscription. Besides, evidences with some uncertainty for identifying another two sources, J0852.0-4642 and J0906.7-520659, with ancient Chinese records are also discussed.
| Last modified: 07/08/2000 |
![]() ![]()
|