### Abstract

We develop an algorithm for estimating parameters of a distribution sampled with contamination. We employ a statistical technique known as "expectation maximization" (EM). Given models for both member and contaminant populations, the EM algorithm iteratively evaluates the membership probability of each discrete data point, then uses those probabilities to update parameter estimates for member and contaminant distributions. The EM approach has wide applicability to the analysis of astronomical data. Here we tailor an EM algorithm to operate on spectroscopic samples obtained with the Michigan-MIKE Fiber System (MMFS) as part of our Magellan survey of stellar radial velocities in nearby dwarf spheroidal (dSph) galaxies. These samples, to be presented in a companion paper, contain discrete measurements of line-of-sight velocity, projected position, and pseudo-equivalent width of the Mg-triplet feature, for 1000-2500 stars per dSph, including some fraction of contamination by foreground Milky Way stars. The EM algorithm uses all of the available data to quantify dSph and contaminant distributions. For distributions (e.g., velocity and Mg-index of dSph stars) assumed to be Gaussian, the EM algorithm returns maximum-likelihood estimates of the mean and variance, as well as the probability that each star is a dSph member. These probabilities can serve as weights in subsequent analyses. Applied to our MMFS data, the EM algorithm identifies more than 5000 stars as probable dSph members. We test the performance of the EM algorithm on simulated data sets that represent a range of sample size, level of contamination, and amount of overlap between dSph and contaminant velocity distributions. The simulations establish that for samples ranging from large (N 3000, characteristic of the MMFS samples) to small (N 30), resembling new samples for extremely faint dSphs), the EM algorithm distinguishes members from contaminants and returns accurate parameter estimates much more reliably than conventional methods of contaminant removal (e.g., sigma clipping).

Original language | English (US) |
---|---|

Pages (from-to) | 3109-3138 |

Number of pages | 30 |

Journal | Astronomical Journal |

Volume | 137 |

Issue number | 2 |

DOIs | |

State | Published - 2009 |

### Fingerprint

### Keywords

- Galaxies: Dwarf galaxies: Individual (Carina, Fornax, Sculptor, Sextans) galaxies: Kinematics and dynamics Local Group techniques: Radial velocities

### ASJC Scopus subject areas

- Space and Planetary Science
- Astronomy and Astrophysics

### Cite this

*Astronomical Journal*,

*137*(2), 3109-3138. https://doi.org/10.1088/0004-6256/137/2/3109

**Clean kinematic samples in dwarf spheroidals : An algorithm for evaluting membership and estimating distribution parameters when contamination is present.** / Walker, Matthew G.; Mateo, Mario; Olszewski, Edward W; Sen, Bodhisattva; Woodroofe, Michael.

Research output: Contribution to journal › Article

*Astronomical Journal*, vol. 137, no. 2, pp. 3109-3138. https://doi.org/10.1088/0004-6256/137/2/3109

}

TY - JOUR

T1 - Clean kinematic samples in dwarf spheroidals

T2 - An algorithm for evaluting membership and estimating distribution parameters when contamination is present

AU - Walker, Matthew G.

AU - Mateo, Mario

AU - Olszewski, Edward W

AU - Sen, Bodhisattva

AU - Woodroofe, Michael

PY - 2009

Y1 - 2009

N2 - We develop an algorithm for estimating parameters of a distribution sampled with contamination. We employ a statistical technique known as "expectation maximization" (EM). Given models for both member and contaminant populations, the EM algorithm iteratively evaluates the membership probability of each discrete data point, then uses those probabilities to update parameter estimates for member and contaminant distributions. The EM approach has wide applicability to the analysis of astronomical data. Here we tailor an EM algorithm to operate on spectroscopic samples obtained with the Michigan-MIKE Fiber System (MMFS) as part of our Magellan survey of stellar radial velocities in nearby dwarf spheroidal (dSph) galaxies. These samples, to be presented in a companion paper, contain discrete measurements of line-of-sight velocity, projected position, and pseudo-equivalent width of the Mg-triplet feature, for 1000-2500 stars per dSph, including some fraction of contamination by foreground Milky Way stars. The EM algorithm uses all of the available data to quantify dSph and contaminant distributions. For distributions (e.g., velocity and Mg-index of dSph stars) assumed to be Gaussian, the EM algorithm returns maximum-likelihood estimates of the mean and variance, as well as the probability that each star is a dSph member. These probabilities can serve as weights in subsequent analyses. Applied to our MMFS data, the EM algorithm identifies more than 5000 stars as probable dSph members. We test the performance of the EM algorithm on simulated data sets that represent a range of sample size, level of contamination, and amount of overlap between dSph and contaminant velocity distributions. The simulations establish that for samples ranging from large (N 3000, characteristic of the MMFS samples) to small (N 30), resembling new samples for extremely faint dSphs), the EM algorithm distinguishes members from contaminants and returns accurate parameter estimates much more reliably than conventional methods of contaminant removal (e.g., sigma clipping).

AB - We develop an algorithm for estimating parameters of a distribution sampled with contamination. We employ a statistical technique known as "expectation maximization" (EM). Given models for both member and contaminant populations, the EM algorithm iteratively evaluates the membership probability of each discrete data point, then uses those probabilities to update parameter estimates for member and contaminant distributions. The EM approach has wide applicability to the analysis of astronomical data. Here we tailor an EM algorithm to operate on spectroscopic samples obtained with the Michigan-MIKE Fiber System (MMFS) as part of our Magellan survey of stellar radial velocities in nearby dwarf spheroidal (dSph) galaxies. These samples, to be presented in a companion paper, contain discrete measurements of line-of-sight velocity, projected position, and pseudo-equivalent width of the Mg-triplet feature, for 1000-2500 stars per dSph, including some fraction of contamination by foreground Milky Way stars. The EM algorithm uses all of the available data to quantify dSph and contaminant distributions. For distributions (e.g., velocity and Mg-index of dSph stars) assumed to be Gaussian, the EM algorithm returns maximum-likelihood estimates of the mean and variance, as well as the probability that each star is a dSph member. These probabilities can serve as weights in subsequent analyses. Applied to our MMFS data, the EM algorithm identifies more than 5000 stars as probable dSph members. We test the performance of the EM algorithm on simulated data sets that represent a range of sample size, level of contamination, and amount of overlap between dSph and contaminant velocity distributions. The simulations establish that for samples ranging from large (N 3000, characteristic of the MMFS samples) to small (N 30), resembling new samples for extremely faint dSphs), the EM algorithm distinguishes members from contaminants and returns accurate parameter estimates much more reliably than conventional methods of contaminant removal (e.g., sigma clipping).

KW - Galaxies: Dwarf galaxies: Individual (Carina, Fornax, Sculptor, Sextans) galaxies: Kinematics and dynamics Local Group techniques: Radial velocities

UR - http://www.scopus.com/inward/record.url?scp=64849117009&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=64849117009&partnerID=8YFLogxK

U2 - 10.1088/0004-6256/137/2/3109

DO - 10.1088/0004-6256/137/2/3109

M3 - Article

AN - SCOPUS:64849117009

VL - 137

SP - 3109

EP - 3138

JO - Astronomical Journal

JF - Astronomical Journal

SN - 0004-6256

IS - 2

ER -