Clean kinematic samples in dwarf spheroidals

An algorithm for evaluting membership and estimating distribution parameters when contamination is present

Matthew G. Walker, Mario Mateo, Edward W Olszewski, Bodhisattva Sen, Michael Woodroofe

Research output: Contribution to journalArticle

75 Citations (Scopus)

Abstract

We develop an algorithm for estimating parameters of a distribution sampled with contamination. We employ a statistical technique known as "expectation maximization" (EM). Given models for both member and contaminant populations, the EM algorithm iteratively evaluates the membership probability of each discrete data point, then uses those probabilities to update parameter estimates for member and contaminant distributions. The EM approach has wide applicability to the analysis of astronomical data. Here we tailor an EM algorithm to operate on spectroscopic samples obtained with the Michigan-MIKE Fiber System (MMFS) as part of our Magellan survey of stellar radial velocities in nearby dwarf spheroidal (dSph) galaxies. These samples, to be presented in a companion paper, contain discrete measurements of line-of-sight velocity, projected position, and pseudo-equivalent width of the Mg-triplet feature, for 1000-2500 stars per dSph, including some fraction of contamination by foreground Milky Way stars. The EM algorithm uses all of the available data to quantify dSph and contaminant distributions. For distributions (e.g., velocity and Mg-index of dSph stars) assumed to be Gaussian, the EM algorithm returns maximum-likelihood estimates of the mean and variance, as well as the probability that each star is a dSph member. These probabilities can serve as weights in subsequent analyses. Applied to our MMFS data, the EM algorithm identifies more than 5000 stars as probable dSph members. We test the performance of the EM algorithm on simulated data sets that represent a range of sample size, level of contamination, and amount of overlap between dSph and contaminant velocity distributions. The simulations establish that for samples ranging from large (N 3000, characteristic of the MMFS samples) to small (N 30), resembling new samples for extremely faint dSphs), the EM algorithm distinguishes members from contaminants and returns accurate parameter estimates much more reliably than conventional methods of contaminant removal (e.g., sigma clipping).

Original languageEnglish (US)
Pages (from-to)3109-3138
Number of pages30
JournalAstronomical Journal
Volume137
Issue number2
DOIs
StatePublished - 2009

Fingerprint

contamination
estimating
kinematics
contaminants
pollutant
stars
fibers
velocity distribution
maximum likelihood estimates
clipping
parameter
distribution
dwarf stars
data systems
dwarf galaxies
estimates
radial velocity
line of sight
simulation
fibre

Keywords

  • Galaxies: Dwarf galaxies: Individual (Carina, Fornax, Sculptor, Sextans) galaxies: Kinematics and dynamics Local Group techniques: Radial velocities

ASJC Scopus subject areas

  • Space and Planetary Science
  • Astronomy and Astrophysics

Cite this

Clean kinematic samples in dwarf spheroidals : An algorithm for evaluting membership and estimating distribution parameters when contamination is present. / Walker, Matthew G.; Mateo, Mario; Olszewski, Edward W; Sen, Bodhisattva; Woodroofe, Michael.

In: Astronomical Journal, Vol. 137, No. 2, 2009, p. 3109-3138.

Research output: Contribution to journalArticle

@article{c20e2176fa4b44d6b7066351676926ce,
title = "Clean kinematic samples in dwarf spheroidals: An algorithm for evaluting membership and estimating distribution parameters when contamination is present",
abstract = "We develop an algorithm for estimating parameters of a distribution sampled with contamination. We employ a statistical technique known as {"}expectation maximization{"} (EM). Given models for both member and contaminant populations, the EM algorithm iteratively evaluates the membership probability of each discrete data point, then uses those probabilities to update parameter estimates for member and contaminant distributions. The EM approach has wide applicability to the analysis of astronomical data. Here we tailor an EM algorithm to operate on spectroscopic samples obtained with the Michigan-MIKE Fiber System (MMFS) as part of our Magellan survey of stellar radial velocities in nearby dwarf spheroidal (dSph) galaxies. These samples, to be presented in a companion paper, contain discrete measurements of line-of-sight velocity, projected position, and pseudo-equivalent width of the Mg-triplet feature, for 1000-2500 stars per dSph, including some fraction of contamination by foreground Milky Way stars. The EM algorithm uses all of the available data to quantify dSph and contaminant distributions. For distributions (e.g., velocity and Mg-index of dSph stars) assumed to be Gaussian, the EM algorithm returns maximum-likelihood estimates of the mean and variance, as well as the probability that each star is a dSph member. These probabilities can serve as weights in subsequent analyses. Applied to our MMFS data, the EM algorithm identifies more than 5000 stars as probable dSph members. We test the performance of the EM algorithm on simulated data sets that represent a range of sample size, level of contamination, and amount of overlap between dSph and contaminant velocity distributions. The simulations establish that for samples ranging from large (N 3000, characteristic of the MMFS samples) to small (N 30), resembling new samples for extremely faint dSphs), the EM algorithm distinguishes members from contaminants and returns accurate parameter estimates much more reliably than conventional methods of contaminant removal (e.g., sigma clipping).",
keywords = "Galaxies: Dwarf galaxies: Individual (Carina, Fornax, Sculptor, Sextans) galaxies: Kinematics and dynamics Local Group techniques: Radial velocities",
author = "Walker, {Matthew G.} and Mario Mateo and Olszewski, {Edward W} and Bodhisattva Sen and Michael Woodroofe",
year = "2009",
doi = "10.1088/0004-6256/137/2/3109",
language = "English (US)",
volume = "137",
pages = "3109--3138",
journal = "Astronomical Journal",
issn = "0004-6256",
publisher = "IOP Publishing Ltd.",
number = "2",

}

TY - JOUR

T1 - Clean kinematic samples in dwarf spheroidals

T2 - An algorithm for evaluting membership and estimating distribution parameters when contamination is present

AU - Walker, Matthew G.

AU - Mateo, Mario

AU - Olszewski, Edward W

AU - Sen, Bodhisattva

AU - Woodroofe, Michael

PY - 2009

Y1 - 2009

N2 - We develop an algorithm for estimating parameters of a distribution sampled with contamination. We employ a statistical technique known as "expectation maximization" (EM). Given models for both member and contaminant populations, the EM algorithm iteratively evaluates the membership probability of each discrete data point, then uses those probabilities to update parameter estimates for member and contaminant distributions. The EM approach has wide applicability to the analysis of astronomical data. Here we tailor an EM algorithm to operate on spectroscopic samples obtained with the Michigan-MIKE Fiber System (MMFS) as part of our Magellan survey of stellar radial velocities in nearby dwarf spheroidal (dSph) galaxies. These samples, to be presented in a companion paper, contain discrete measurements of line-of-sight velocity, projected position, and pseudo-equivalent width of the Mg-triplet feature, for 1000-2500 stars per dSph, including some fraction of contamination by foreground Milky Way stars. The EM algorithm uses all of the available data to quantify dSph and contaminant distributions. For distributions (e.g., velocity and Mg-index of dSph stars) assumed to be Gaussian, the EM algorithm returns maximum-likelihood estimates of the mean and variance, as well as the probability that each star is a dSph member. These probabilities can serve as weights in subsequent analyses. Applied to our MMFS data, the EM algorithm identifies more than 5000 stars as probable dSph members. We test the performance of the EM algorithm on simulated data sets that represent a range of sample size, level of contamination, and amount of overlap between dSph and contaminant velocity distributions. The simulations establish that for samples ranging from large (N 3000, characteristic of the MMFS samples) to small (N 30), resembling new samples for extremely faint dSphs), the EM algorithm distinguishes members from contaminants and returns accurate parameter estimates much more reliably than conventional methods of contaminant removal (e.g., sigma clipping).

AB - We develop an algorithm for estimating parameters of a distribution sampled with contamination. We employ a statistical technique known as "expectation maximization" (EM). Given models for both member and contaminant populations, the EM algorithm iteratively evaluates the membership probability of each discrete data point, then uses those probabilities to update parameter estimates for member and contaminant distributions. The EM approach has wide applicability to the analysis of astronomical data. Here we tailor an EM algorithm to operate on spectroscopic samples obtained with the Michigan-MIKE Fiber System (MMFS) as part of our Magellan survey of stellar radial velocities in nearby dwarf spheroidal (dSph) galaxies. These samples, to be presented in a companion paper, contain discrete measurements of line-of-sight velocity, projected position, and pseudo-equivalent width of the Mg-triplet feature, for 1000-2500 stars per dSph, including some fraction of contamination by foreground Milky Way stars. The EM algorithm uses all of the available data to quantify dSph and contaminant distributions. For distributions (e.g., velocity and Mg-index of dSph stars) assumed to be Gaussian, the EM algorithm returns maximum-likelihood estimates of the mean and variance, as well as the probability that each star is a dSph member. These probabilities can serve as weights in subsequent analyses. Applied to our MMFS data, the EM algorithm identifies more than 5000 stars as probable dSph members. We test the performance of the EM algorithm on simulated data sets that represent a range of sample size, level of contamination, and amount of overlap between dSph and contaminant velocity distributions. The simulations establish that for samples ranging from large (N 3000, characteristic of the MMFS samples) to small (N 30), resembling new samples for extremely faint dSphs), the EM algorithm distinguishes members from contaminants and returns accurate parameter estimates much more reliably than conventional methods of contaminant removal (e.g., sigma clipping).

KW - Galaxies: Dwarf galaxies: Individual (Carina, Fornax, Sculptor, Sextans) galaxies: Kinematics and dynamics Local Group techniques: Radial velocities

UR - http://www.scopus.com/inward/record.url?scp=64849117009&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=64849117009&partnerID=8YFLogxK

U2 - 10.1088/0004-6256/137/2/3109

DO - 10.1088/0004-6256/137/2/3109

M3 - Article

VL - 137

SP - 3109

EP - 3138

JO - Astronomical Journal

JF - Astronomical Journal

SN - 0004-6256

IS - 2

ER -