1. INTRODUCTION
TheKepler mission is designed to determine the frequency of Earth-size planets in and near the habitable zone of solar-like stars via the detection of photometric transits (Borucki et al.2010a; Koch et al.2010).Kepler surveys more than 100,000 late-type dwarf stars in the solar neighborhood with visual magnitudes between eight and 16 for >4 yr, looking for transits of planets around those stars. There are several astrophysical phenomena that can cause a false-positive detection that mimics a planetary transit on a target star. Approximately 40% of the transit-like signals detected byKepler that have been deemedKepler objects of interest (KOIs) have been determined to be due to false positives.
To increase the reliability of the determination of which KOIs are planetary candidates, it is important to identify as many of these false positives as possible. In many cases, the identification of false-positive KOIs is based onKepler data alone because these KOIs have transit signals that are too small for conventional ground-based follow-up. This paper describes several distinct but complementary methods for usingKepler data to detect cases where the source of a transit-like event is offset from the target star’s position on the sky. These background false positives make up a substantial fraction of all false positives, with most of the other false positives being due to grazing eclipsing stellar companions associated with the target star. At low Galactic latitudes, “background false positives” account for almost 40% of allKepler transit-like signals, with the fraction dropping to about 10% at high Galactic latitudes (see Fig. 1). Background false positives are detected inKepler data by observing that the pixels that change during transit are offset from those that contain the image of the target star. Such cases are referred to as “active pixel offsets” (APOs). The methods described in this paper cannot detect all background transit sources: for example, when the transit source is extremely close to the target star on the sky. However, they can identify a large percentage of background false positives. We believe that by identifying false positives that have an observable offset, the techniques described in this paper reduce the background false-positive rate in the planetary candidate catalog to below 10%.

Fig. 1.— Distribution of the fraction of transit signal sources that are offset from the target star, indicating a background false positive. For low Galactic latitudes around 40% of allKepler KOIs are background false positives, while for mid to high Galactic latitudes the fraction drops to about 10%. This figure is based on data from Batalha et al. (2012).
The techniques described in this paper rely on pixel data returned from theKepler spacecraft. Without this pixel data, the identification of background transit sources is much more difficult. Indeed, for dim target stars or for small planets with low signal-to-noise ratio (S/N) transits, ground-based follow-up typically will not suffice to identify background false positives. In such cases, background false-positive identification would be impossible using stellar light curves alone. Without the pixels, the star hosting the transit signal cannot be determined. Without knowing the star hosting the transit, the object causing the transit cannot be characterized. Therefore, the availability of the pixel data used to create the stellar light curves is a critical component of the success of any transit survey. This insight should drive the design of future transit survey missions.
In the rest of this section, we discuss background false positives in general, their identification via pixel analysis, and how that identification is used in the vetting ofKepler planet candidates. The bulk of this paper describes several techniques for performing pixel-level analysis to identify background false positives. We describe in § 2 the photometric centroid technique and in § 3 the use of difference images to localize the transit signal source. Pixel correlations are described in § 4. We briefly address the special case of saturated targets in § 5. Section 6 presents several perspectives on how well these techniques perform, with special emphasis on comparing the photometric centroid and difference image techniques.
Throughout this paper we use several example KOIs (Borucki et al.2011a,b; Batalha et al.2010a,2012; C. Burke et al. 2013, in preparation). Some of these KOIs are now valid candidates, while others have been determined to be false positives. We give particular attention to two examples to illustrate our techniques: KOI-221, which is aKepler target where the transit source location is observationally coincident with the target, and KOI-109, which is aKepler target for which the transit source is clearly offset from the target star. The list of KOIs and their properties can be found at the NASA Exoplanet Archive5 while the light curves and pixel data for allKepler targets can be found at the Mikulski Archive for Space Telescopes.6
1.1. Background False Positives
There are several astrophysical phenomena that can mimic a planetary transit on a specified target star. Brown (2003) distinguishes 12 combinations of giant planets and stars in eclipsing and transiting systems that can produce light curves mimicking a planet transiting a solitary primary star. Six of the combinations do not involve planets at all, and four others distort the transit light curve so that the size of the planet is indeterminate.
In this paper, we are concerned with those phenomena which are due to astrophysical sources that are not associated with the target star. These primarily include eclipsing binaries or large planet transits on stars that have flux in the pixels used to create the target star’s light curve. Because of dilution from the target star, even deep background eclipsing binaries often cannot be identified from the target star’s light curve alone. Analysis at the pixel level is required to identify the location of the transit signal source. We are particularly interested in cases where the transit signal’s source is sufficiently separated from the target star that we can measure a statistically significant offset between the target star and the transit source.
Additional sources of false positives that can be detected by the methods described in this paper include:
- 1.Very wide multiple star systems, where the transit source is gravitationally bound to the target star. When the separation between the target star and the companion hosting the transit signal source is large enough, the methods described in this paper can detect the offset.
- 2.Optical ghosts and electronic crosstalk (Caldwell et al.2010) from planetary transits or eclipsing binaries elsewhere on theKepler focal plane. When the image of the ghost or crosstalk falls on the target star’s pixels, but is sufficiently separated from the target star, these sources can be detected by the methods described in this paper. In addition, optical ghosts can have very nonstellar morphologies. Transit signals due to optical ghosts will exhibit these morphologies in several of the techniques described in this paper.
Our basic strategy is to measure the location of the transit source on the sky, compare that to the location of the target star, and declare the transit signal a false positive if the transit source location is significantly offset (more than three standard deviations, written >3σ) from the target star location, based on reliable data. All the methods of computing these offsets described in this paper use χ2 minimizing (least-squares) methods. Assuming Gaussian statistics, these offsets form a two-degree-of-freedom χ2 distribution that has offsets >3σ due to random fluctuations about 1.11% of the time. As we will show in this paper, offset uncertainties follow an approximately Gaussian distribution in a statistical sense, though the uncertainty around individual targets may not be Gaussian.
1.2. Pixel Analysis to Identify the Location of the Transit Source
As mentioned in § 1.1, the background binary causing a transit signal can be very faint, indeed significantly fainter than the general background or the wings of the target star, and still mimic a planetary transit. Consider the case of an aperture that contains only a target star with constant fluxF and a background binary with other negligible sky background. If the background binary is Δm magnitudes fainter than the target star, then the flux ratio of the background star to the target star is ΔF = (100)-Δm/5. If the background binary has a fractional eclipse depthdback, then the total flux out of transit isFout = F + FΔF. In transit, the total flux isFin = F + (1 - dback)FΔF. Therefore, the fractional observed depth in the aperture is

In the case of a 14th magnitude target star and a 22nd magnitude background eclipsing binary withdback = 0.5, we getdobs = 315 ppm. A transit of this depth is easily detected inKepler data and would mimic the transit of a small planet, though the 22nd magnitude background star would not be readily apparent in theKepler data.
There are several ways to useKepler pixel data to measure the distance from the target star to the transit source. We focus on three classes of techniques, each of which have their strengths and weaknesses. As we describe in detail below, none of these techniques work well in all circumstances due to systematic error sources that vary from technique to technique and situation to situation, but we find that the combination of these techniques covers the majority of cases where there is sufficient S/N to measure the transit source location. Our focus is on techniques that can be reliably automated due to the large number of objects in theKepler data. We would also, when possible, like to associate the transit source with a known star. Therefore, we describe techniques that provide an estimate of the transit source location on the sky rather than simply determining if the transit source is at the target star location.
Kepler collects pixels specific to each target (Bryson et al.2010b). A subset of these pixels, called the photometric optimal aperture, is summed to create the light curve for the target (see Fig. 2). The pixel analysis in this paper uses either the optimal aperture plus one halo of pixels, defined as any pixel adjacent to the optimal aperture (the photometric centroid technique described in § 2), or all pixels collected for a target (the difference image technique described in § 3). For most targets,Kepler pixel data are collected once every “long cadence” (29.4 minutes), and for a subset of targets, data are collected once every “short cadence” (0.98 minutes). In this paper we limit our discussion to long cadence observations.

Fig. 2.— Pixels collected for aKepler target. All collected pixels are outlined by thesolid white line. The photometric optimal aperture is outlined with awhite dot-dashed line. The pixel values are shown by the pixel color.Asterisks give the location of known stars in the field, including those just outside the collected pixels. For each star, theKepler input catalog number andKepler magnitude are given.
All of the methods described in this paper identify spatially separated false positives by comparing pixel values during in-transit cadences to values of the same pixels during out-of-transit cadences.
Analysis ofKepler pixels to identify the location of the transit relative to the target star has to solve three problems:
- 1.Analyzing the Pixels Within a Cadence. There are various ways that the transit source location can be inferred from pixel data. Some of these methods require the identification of cadences that occur during transit and cadences that do not.
- 2.Combining the Cadences Within a Quarter. TheKepler spacecraft rotates 90° about the photometer boresite every ∼93 days (Koch et al.2010). Each ∼93 day period is referred to as a quarter. While theKepler focal plane is approximately symmetric under these 90° rolls, a star falls on different CCDs at difference pixel coordinates in different quarters. How in-transit and out-of-transit cadences within a quarter are selected and combined varies from technique to technique.
- 3.Combining the Cadences Across Quarters. Some of the techniques we discuss operate within a single quarter and will deliver different results from quarter to quarter. These results for each quarter must be combined to provide an overall measurement.
There are three classes of methods that we use to solve these problems:
- 1.
- 2.Difference Imaging. By constructing the difference of the in- and out-of-transit pixel images, a direct image of the transit source can be constructed, as described in § 3. The centroid of this image provides a direct measurement of the location of the transit source.
- 3.Pixel Correlation Images. When the transit signal can be detected in individual pixels via correlation with the photometric transit signal, an image can be constructed where the value of each pixel is given that correlation value, as described in § 4. This is an alternative method of creating a direct image of the transit source, whose centroid provides the transit source location.
These methods assume that the only source of flux variation is the object creating the transit signal. When this assumption is not satisfied, these methods will be subject to systematic error. Such systematic error will, however, be different for the different techniques; so when these methods give inconsistent results, we have an indication that systematic error is present.
In contrast to the photometric centroid method, which is based on measured centroid shifts, the difference and pixel correlation image methods produce images that directly show the transit source. While the location of the transit source can then easily be determined via photometric centroids of these images, we use a more robust centroid method based on fitting theKepler pixel response function (PRF) (Bryson et al.2010a). The PRF characterizes how light from a single star is spread across several pixels, so it is essentially the system point-spread function (PSF), comprised of the optical PSF convolved with pixel structure and pointing behavior over aKepler long cadence. Given a star’s location on the pixels (including subpixel position), the PRF provides the contribution of that star’s flux to the nearby pixel values. Section 3.3 describes how the PRF is fit to pixel images to determine the location of the transit source.
These three methods are in principle very similar, but have different responses to systematics and noise, transit S/N, and field crowding. The use of all three methods provides increased sensitivity and confidence in the identification of background false positives, particularly when the transit S/N is low.
1.3. Role of Offset Analysis in Planet Candidate Vetting
The techniques described in this paper are used to decide whether or not a detected transit signal belongs on theKepler planetary candidate list. These techniques have been applied toKepler planetary candidate vetting (Borucki et al.2011a,b; Batalha et al.2010a,2012; C. Burke et al. 2013, in preparation) with improved reliability and accuracy over time. The approach that eventually evolved is to identify those targets that show a significant offset between the target star and the transit source, relying primarily on the difference imaging method. Those targets that have a borderline significant source offset or have other cause for concern are examined using all the methods described in this paper, including manual examination of the pixels. Targets that have a confirmed offset from the transit source are identified as false positives. This disposition has changed over time for a small number of targets, as the techniques described in this paper have become more refined and as more data becomes available, resulting in greater measurement precision. The details of how these analyses were applied are described in papers detailing the release of planetary candidate lists (Borucki et al.2010a,2011a,b; Batalha et al.2012; C. Burke et al. 2013, in preparation). We give here a brief history of this evolution. Borucki et al. (2010a) used photometric centroid time series analyzed via the cloud plots described in § 2.1 and an early version of difference images. These difference images were visually examined rather than centroided, so offsets from the target star on the order of a pixel (4″) or larger were identified. Borucki et al. (2011a) and Borucki et al. (2011b) used the difference image method including PRF centroiding described in § 3 without the multiquarter averaging, so each quarter was examined individually. Difference imaging with the multiquarter averaging (§ 3.4.1), joint-multiquarter PRF fits (for low S/N targets) (§ 3.4.2), and pixel correlation images (§ 4) were used in Batalha et al. (2012). C. Burke et al. (2013, in preparation) relied more strongly on multiquarter averaged difference imaging and photometric offsets. Joint-multiquarter PRF fits and pixel correlation images were disabled in C. Burke et al. (2013, in preparation) because of computational limitations. These limitations will be overcome in the future by moving theKepler analysis pipeline to supercomputer platforms.
Because of the evolution towards the techniques described in this paper, the quality of background false-positive identification has changed over time. Therefore, the tables published in the early papers listed above have less accurate background false-positive identification than the later papers. This is reflected in the tables in theKepler archives, so care must be taken when performing statistical analysis with these tables. At the time of this writing an effort is underway to recheck all KOIs using the methods described in this paper, as well as improved light curve analysis to identify nonbackground false positives such as grazing binary stars.
2. SOURCE LOCATION FROM PHOTOMETRIC CENTROID SHIFTS
“Photometric centroids” compute the “center of light” of the pixels associated with a target. When a transit occurs, the photometric centroid will shift, even when the transit is on the target star (the ideal case of a transit on a target star exactly in the center of a symmetric aperture with uniform background, which is required for there to be no centroid shift, is never realized in practice). As described in this section, we use this shift to infer the location of the transit source, from which we can compute the transit source offset from the target star. This method works well when the target star is crowded by many field stars but suffers from high sensitivity to variable flux not associated with the transit, such as stellar variability and photometric noise. As described in § 2.3.1, due to the implementation of theKepler processing pipeline, this method tends to overestimate the distance of the transit source from the target star when the transit source is at the edge of the target star’s pixels.
2.1. Computing Pixel Centroids
The most traditional method for estimating the position of a light source is that of photometric centroids, also known as flux-weighted centroids. Photometric centroids measure the “center of light” of all flux in the pixels. While photometric centroids do not exactly measure the location of any particular star, it will be shown below that under idealized circumstances they can be used to compute the location of a transit source.
The row and column photometric centroids of the pixels for each target are computed for each cadence as:

wherebj is the flux in pixelj at row and column (rj,cj). If we denote the covariance matrix of the pixel valuesbj as
(so the uncertainties in the pixel values are the square root of the diagonals:
), then the standard propagation of errors gives the uncertainty in the photometric row centroid as:

with a similar formula for the uncertainty in the column centroid. We see that the sensitivity of the centroid value σCrow is proportional to the square root of the elements of the covariance matrix
, in particular to the uncertainty in the pixel values σj, divided by the total flux in the pixels
. Therefore, photometric centroids are very sensitive to variations in pixel value, in particular to shot noise and stellar variability.
For photometric centroids computed in theKepler pipeline,j ranges over the optimal aperture plus a single ring of pixels (sometimes called a “halo”). The result is a time series containing the row and column centroids, called “centroid time series.” The centroid shift is defined as the centroid value for cadences out of transit,Cout, subtracted from the centroid value for cadences in transit,Cin : ΔC = Cin - Cout. We assume shifts in different cadences are uncorrelated, so these shifts have an uncertainty given by
.
It is very important to distinguish between the “centroid shift,” which measures how far the centroid moves between in- and out-of-transit cadences, and the “source offset,” which measures the separation of the target star from the transit source. As we will describe below, the centroid shift and source offset are related but measure very different things. The centroid shift measures the change in the photometric centroid due to all changes in flux in the aperture. The source offset is derived from the centroid shift but measures the separation between the target star and the transit source (which may or may not be a different star). In particular, because there is always background flux and field stars, the centroid shift ΔC will always be nonzero even when the transit signal is on the target star. In such cases, the centroid shift can be relatively large while the source offset may be very close to zero.
Low-frequency secular trends due to small, slow changes such as differential velocity aberration, small pointing drifts, and thermally induced focal length changes are common in centroid time series (Christiansen et al.2012). These trends are removed prior to the analysis described in this section, for example by local median filtering using a window of 48 cadences.
To facilitate combining the centroids across quarters, the centroid time series is converted to celestial right ascension (R.A.) and declination (decl.) using theKepler focal plane geometry model in combination with motion polynomials that capture local variations in the focal plane geometry model (Tenenbaum & Jenkins2010). In these coordinates, the centroid shift ΔC is expressed as seconds of arc.
When the centroid shift ΔC is large enough, it can be taken to indicate that the transit source is not on the target star. Using ΔC directly to make this determination must be done with great care, however. The centroid shift ΔC will be smallest when the target star is the source of the transit, the target star is isolated, residual background flux is small after background correction, and the target star is near the geometric center of the centroided pixels. This is rarely the case, however, so even when the target star is the source of the transit, there will be a nontrivial centroid shift. A larger centroid shift that is correlated with the time of transit is an indicator that the transit source may not be the target star. Determining whether a centroid shift indicates that the transit source is not the target star is difficult, however, and depends on the details of other flux sources in the target’s pixel aperture. In § 2.3, we describe how to use the centroid shift to estimate the location of the source of the centroid signal, which is a more robust method for determining whether the transit source is the target star than using the centroid shift alone.
A graphical method showing the correlation between the centroid shift and the transit signal is to plot the median-detrended centroid time series against the normalized, median-detrended light curve flux value. The result is a “cloud plot,” shown in Figure 3. Most points in a cloud plot are out-of-transit cadences and form a cluster around (0, 0). The size of the cloud reflects the sensitivity of the photometric centroid computation to noise in the pixel values. When there is no centroid shift associated with transits, the points in transit (with negative normalized flux) fall directly below the out-of-transit points. When there is a centroid shift associated with the transit, points in transit will fall to the side of the out-of-transit cloud. Seeing sideways motion of the in-transit points, as shown in the right panel of Figure 3, indicates a centroid shift associated with the transit. This suggests that the transit source may be offset from the target star. As explained above, care must be taken when interpreting cloud plots because there may be a nontrivial centroid shift correlated with the transit even when the target star is the transit source.

Fig. 3.— Example cloud plots where the normalized residual flux (y-axis) is plotted against the centroid shift (x-axis). Each point plots the normalized, median-detrended flux value against the median-detrended R.A. (blue crosses) or decl. (red circles) centroid time series in a single long cadence. In both figures, most points are from out-of-transit cadences and form a cloud around (0, 0).Left: When the transit is on an isolated target star (KOI-221 in this example), the centroid does not shift when in transit, so in-transit points are directly below the out-of-transit points.Right: When the transit is on an object offset from the target (KOI-109 in this example), the in-transit centroids are shifted relative to the out-of-transit centroids and appear below to one side, indicating a strong possibility of a background false positive. In this example, the decl. centroid components show a shift while the R.A. components do not, indicating that the transit source is offset in the decl. direction.
2.2. Correlating Centroid Motion with the Transit Model
The centroid time series is sensitive to photometric noise, so quantitatively measuring the correlation of the centroid shift with the photometric transit signal can be difficult, particularly for low S/N transits. A simple approach is to identify all in- and out-of-transit cadences, and compute the average (or median) in- and out-of-transit centroid values. The average centroid shift is then given by the difference of the in- and out-of-transit average centroid locations. This method encounters many difficulties, however: quarter-to-quarter differences in aperture shape will introduce systematic errors, and non-transit related variability will degrade these averages as measures of transit-related shifts. A better method is to fit a transit model computed during data validation (Wu et al.2010) to the centroid time series. This will provide a more robust measurement of ΔC.
In this section, we define the centroid shift time series ΔCn = Cn - Cout, whereCout is the average out-of-transit centroid andn labels the cadence. In this section, we assume that the transit model has been whitened to remove secular variations, such as those due to pointing drift and stellar variability (Wu et al.2010), in which case the centroid shift time series ΔCn must be whitened in the same way. We compute a least-squares fit of the centroid shift time series ΔCn to the transit modelMn multiplied by a constant γ, weighted by the centroid uncertainties. This fit is most easily done by requiring that the transit model and the centroid shift time series both have zero mean when the transit is not occurring. This implies that the transit modelMn = 0 for out-of-transit cadences. When this is the case we minimize

This least-squares minimization problem has the solution

Examples of this fit are given in Figures 4 and5.

Fig. 4.— Example of a fit of the centroid time series to the transit model for a case when the transit source is at the same location as the target star (KOI-221).Top: The detrended flux light curve over all quarters folded on the transit period, with a closeup on the transit.Middle and bottom: The R.A. and decl. detrended centroid shifts ΔC for the same cadences in milliarcseconds. There is no apparent change in the centroid positions at the time of the transit.

Fig. 5.— Example of a fit of the centroid time series to the transit model for a case where the transit source is offset from the target star (KOI-109).Top: The detrended flux light curve over all quarters folded on the transit period, with a closeup on the transit.Middle and bottom: The R.A. and decl. centroid shifts ΔC for the same cadences in milliarcseconds. There is a readily apparent change in the centroid shifts at the time of the transit, particularly in decl. The transit model that best fits the flux light curve is superimposed on each centroid shift plot, scaled by the coefficient γ in equation (4). The value of ΔC = γ in declination is about 0.1 mas. The poor model fit is due to the fact that the transit source for KOI-109 is in fact a deep eclipsing binary, while the model assumes a planetary transit.
Assuming that the centroid and transit model uncertainties are uncorrelated over time, and neglecting uncertainties in the transit model values, the uncertainty in γ is

Only in-transit cadences contribute to the computation of γ and σγ becauseMn = 0 for out-of-transit cadences. BecauseMn is fit to the whitened and normalized flux light curve, it has unit variance, so γ is in the same units as ΔCn and directly gives an estimate of the in- versus out-of-transit shift: ΔC ≈ γ. When the centroids shifts are in R.A. and decl. coordinates, all quarters of data can be simultaneously fit. From equation (5) we see a
reduction in the uncertainty, whereNin is the total number of in-transit cadences, so combining many quarters increases the precision of the estimate of ΔC in each coordinate.
Once the shift is estimated in R.A. and decl. (in seconds of arc), the shift distance is simply

with uncertainty

A high-level detection statistic indicating whether a detected shift is statistically significant is also computed. This statistic measures the probability that the detected shift is due to an actual signal rather than a statistical fluctuation in white noise by subtracting the residual χ2 from the signal χ2. From this statistic, a significance metric is constructed that is normalized to the range [0, 1], where one means that there is no detected shift and zero means that the shift is highly significant. This is equivalent to equation (4) of Wu et al. (2010), which in our notation is given by

2.2.1. Impact of Crowding and Variability on the Centroid Shift Estimate
The computation of the in-transit centroid shift assumes that the transiting object is the only source of time varying flux that is correlated with the transit signal in the target star’s pixels. While this is usually a reasonable assumption, it is sometimes violated, introducing systematic error into the centroid shift estimate. A dramatic example is KOI-1860, whose pixels are shown in Figure 6. In this case, there is a field star that is 2.7 mag brighter than the target star at the edge of the collected pixels. Examination of the pixel flux time series shows that this bright star has moderately high variability on short time scales. In addition, because this bright star is at the edge of the collected pixels and is only partially captured, there are strong variations in flux due to spacecraft pointing jitter. The effect of these variations on the centroid time series are shown in Figure 7. These variations are on a time scale that occasionally correlates with the transit signal, leading to a small spurious measured centroid shift in the fit in equation (4). The reconstructed transit source location using this spurious shift measurement, described in § 2.3, indicates a transit source separated from the target star by about 4″. As we will see in § 3.4.1, however, the PRF-fit technique provides strong evidence that the transit source is only about a third of an arcsecond from the target star.

Fig. 6.— Pixels collected for KOI-1860 in quarter 10. The pixels are dominated by the field star KIC 4157320 which is 2.7 mag brighter than the target star. KIC 4157320 has strong variability. In addition, because it is only partially captured in the pixels, spacecraft pointing variations are apparent in the pixel flux light curves.

Fig. 7.— The (not folded) flux and photometric centroid time series for KOI-1860 in quarter 10. Thevertical red lines indicate times of transit. The bright field star at the edge of the aperture (see Fig. 6) causes strong variations in the centroid time series due to the intrinsic variability of that star combined with spacecraft pointing jitter, which is exacerbated by that star being only partially captured in the pixels. These variations cause a spurious centroid shift that is correlated with the transit signal.
2.3. Estimating the Transit Source Location from Centroid Motion
Photometric centroids are the weighted average of all flux in the target star’s pixels, so they do not provide direct information about the location of the target star or the transit source. In particular, as explained in § 2.1, a statistically significant shift does not necessarily imply that the transit source is offset from the target star. In the
We denote the R.A. and decl. components of the average out-of-transit centroid as (
,
) and the centroid shift measured as described in § 2.2 as (ΔCα, ΔCδ). If the observed transit depth isdobs, then, as shown in the

(see Fig. 8). When all flux from the transit source is captured in the aperture, then this centroid gives the location of the transit source.

Fig. 8.— Illustration of the relationship between centroids, centroid shifts, the background eclipsing binary causing the transit signal, and the target star in equation (9) for an otherwise empty aperture. The photometric centroid when a transit is not occurring is given byCout (filled circle). If the transit is due to an eclipse on the background star, the centroid will shift during the eclipse towards the target star toCin (open circle). The resulting transit shift is ΔC = Cin - Cout. Applying equation (9) gives an estimate of the transit source location (filled square), which in an idealized case will correspond to the location of the transit source.
The formal uncertainty in the source position is given in terms of the centroid uncertainty σCα and depth uncertainty σdobs by


These uncertainties do not account for systematic error due to other sources of varying flux.
Fordobs ≪ 1, equation (9) reduces to

the approximation given in equation (2) of Wu et al. (2010). The uncertainties are similarly approximated by replacing (1/dobs - 1) by 1/dobs. This approximation has an error that is proportional todobs, which is very small for mostKepler planetary candidates.
Once we have the centroid source location from equation (9), we compare it with the target location to determine the source offset. The target star location cannot, however, be reliably determined from the centroid time series, so we take the target star position from the Kepler input catalog. This choice potentially introduces new sources of systematic error, particularly due to unknown proper motion.
Given the target star’s catalog location (αtarget, δtarget), we can compute the target offset and uncertainty from the offset components Δα = (αtransit - αtarget) cos δ and Δδ = δtransit - δtarget as

where
and
.
We can now determine if the transit source is statistically significantly offset from the target star by observing whetherD > 3σD.
2.3.1. Systematic Errors in the Source Position Estimate
As discussed in the

Fig. 9.— Optimal aperture compared with the pixels used for photometric centroiding. The optimal aperture pixels are outlined by thedot-dashed line, while the pixels used for photometric centroiding are outlined by thedashed line.
In the typical background false-positive case when the transit source is associated with a field star that is significantly dimmer than the target star, the observed depth in the optimal aperture (the depth computed by theKepler pipeline) will be smaller than the depth that would have been observed using the centroided pixels. This will result in an overestimate of the distance of the transit source from the out-of-transit photometric centroidCout in equation (9). Occasionally the field star associated with the transit source will be brighter than the target star, so the flux from the target star dominates the centroids. In this case, the observed depth in both apertures will be similar, resulting in less of an overshoot. This behavior is observed in § 6.1. See the
The dependence of the source offset estimate on the ratio of the brightness of the background star to that of the target star is shown in Figure 10. This example is similar to that in Figure 6, where the background star causing the transit signal is outside the optimal aperture and mostly, but not completely, captured in the centroided pixels. When the background star is dim, the estimated transit source overshoots the correct offset. When the background star is significantly brighter than the target star, then the flux from the background star dominates the depth estimate, so the depth based on the centroided pixels is about the same as the depth based on the optimal apertures. But because the background star is close to the edge of the centroided pixels, not all flux from the background star is captured. Therefore, the source offset estimate in equation (9) gives the centroid of the flux in the pixels from the background star, which is closer to the target star than the background star itself.

Fig. 10.— Photometric-based transit source offset as a function of the ratio of the background source brightness to the target star brightness. The example shown here is for a 0.1% transit on a background star that is 10″ from the target star. The optimal aperture in this case is 2 × 2Kepler pixels (7.96″ × 7.96″), so the background star is outside the optimal aperture in the halo pixels. Because significant flux from the background star falls outside the captured pixels, the source position estimate (eq. [9]) underestimates the actual position of the background star.
3. DIFFERENCE IMAGING
The “difference imaging technique” computes the difference between average in- and out-of-transit pixel values. These pixel differences provide an image of the transit source at its true location. A centroid of this “difference image” provides the location of the transit source. To measure this centroid, we fit theKepler pixel response function (PRF), looking for the PRF position that best matches the difference pixels. We compare this position to the PRF fit to the out-of-transit position, which provides the target star position when it is not crowded by field stars. The difference of these centroids gives us the offset of the transit signal from the target star. This method is more robust against photometric variability than the photometric centroid method, but is sensitive to scene crowding.
3.1. Concept of Difference Imaging
The difference image technique is based on the insight that subtracting the in-transit pixel values from the out-of-transit pixel values gives an image that shows only those pixels that have changed during the transits. Further, if the changes during transits are due to a change in brightness of a star (as is the case for a planetary transit or an eclipsing binary), then the bright pixels in the difference image will be those of that star with flux given by the fractional transit depth times the flux of that star.
More precisely, consider a set of pixels that contain flux fromM stars, labeled by the indexj, at locations (αj, δj) with fluxbj (we neglect background flux in this simple analysis). The PSF will distribute the flux from each of these stars over several pixels. We express the flux on the pixel at rowr and columnc due to starj by the unit flux functionf(αj,δj,r,c) [so the sum over all pixels off(αj,δj,r,c) = 1]. Then, the out-of-transit pixel values due to all stars will be given by
. If stark has a transit of depthdback, then during midtransit, the pixel values would be given by
. In the ideal case where the only flux change is in stark, the difference image will beFout(r,c) - Fin(r,c) = dbackbkf(αk,δk,r,c), which is exactly the image of stark with fluxdbackbk.
Difference images provide direct information about the location of the transit source, as opposed to the use of photometric centroids in § 2.1, where the source location is inferred.
Example pixel images are shown in Figures 11 and12. In Figure 11, we see an example of a star (KOI-221) for which there is no apparent offset between the target star and the transit source. In this case, the difference image looks much like the in- and out-of-transit images, likely because the target star is itself the source of the transit (and there are no other stars of comparable brightness in the out-of-transit image). Therefore, the only difference between the difference image and the out-of-transit image is the flux level in the pixels. Figure 12 shows a case (KOI-109) where the difference image is dramatically different from the out-of-transit image and appears as a star image coincident with the dim unclassified star KIC 4752452. Because KIC 4752452 is unclassified, it does not have aKepler magnitude. In this case, the pixel data show that the transit source is clearly not on the target star.

Fig. 11.— Example pixel images for KOI-221 in quarter 7, which shows no indication that the transit is not on the target star. In all figures, thedotted white line borders the pixels of the optimal aperture, while thesolid white line borders all pixels collected for this target. Known stars are shown as white asterisks, with each star’s KIC number andKepler magnitude.Upper right: Averaged out-of-transit pixel image.Lower left: In-transit pixel image.Upper Left: Difference image = out-of-transit pixel image—in-transit pixel image.Lower right: Difference image normalized by pixel value uncertainty. In this case, the difference image appears identical to the in- and out-of-transit images, which indicates that the transit source is coincident with the target star.

Fig. 12.— Example pixel images for KOI-109 in quarter 4, which shows indications that the transit is not on the target star.Upper right: Averaged out-of-transit pixel image.Lower left: In-transit pixel image.Upper left: Difference image = out-of-transit pixel image—in-transit pixel image.Lower right: Difference image normalized by pixel value uncertainty. In this case, the difference image appears to be very different from the in- and out-of-transit images, which indicates that the transit source is coincident with the star KIC 4752452.
When the transit S/N is high the pixel images appear as in Figures 11 (S/N = 378) and 12 (S/N = 101), with very well defined star-like difference images. When the S/N is high and the transit is on the target star, as in Figure 11, we expect the difference image to look like the out-of-transit image. Figure 13 shows an example of a low S/N transit on KOI-2949 with a S/N of 11. In this figure, the difference image looks significantly different from the out-of-transit image, so a cursory inspection of only this quarter’s out-of-transit and difference images would indicate a significant offset. But examination of other quarters finds offsets in other directions in some quarters and much smaller offsets in other quarters. When the S/N is low, the difference image is subject to pixel-level systematics that can pollute the difference image. As we will see in § 3.4, combining quarters puts the transit source statistically close to the target. When the S/N is very low, the difference image is dominated by noise because the transit does not have sufficient signal in individual quarters.

Fig. 13.— Pixel images for a low S/N transit on KOI-2949 with an S/N of 11. The difference image appears significantly different from the out-of-transit image in this quarter, indicating that the transit source is not on the target star. But other quarters show the transit source in other locations, including on the target star. This situation is typical for low S/N transits, and more reliable measurement of the transit source location can be attained by combining the quarters as described in § 3.4. In this example, the combined quarter result indicates that the transit location is statistically consistent with the target star.
When the offset is as dramatic as that in Figure 12, cursory visual inspection is sufficient to determine that the transit signal does not occur on the target star. We are interested, however, in measuring smaller offsets that may not be so visually obvious. In addition, we wish to have the ability to automatically measure and detect such transit-source offsets for thousands of transit signals. This can be done by measuring the centroid of the difference image and comparing with estimates of the target star position. This approach encounters several difficulties:
- 1.Difference images can be noisy, particularly for low S/N transits. This is particularly a problem for transits near spacecraft thermal events and in multiple planet systems, where the transit signals from multiple planets can interfere with each other.
- 2.Determination of the location of the target star should use the same method as the difference image to minimize the impact of systematic measurement errors.
- 3.The structure of the background signal for the target star due to crowding will be very different from the difference image background signal because non-variable background stars will cancel out in the difference image.
- 4.In different quarters, stars fall in different places on different pixels, and pixel apertures vary from quarter to quarter. Therefore, the offsets measured in different quarters can be different.
We address these difficulties through the following strategies:
- 1.Careful construction of the in- and out-of-transit images, described in § 3.2, so the difference image is as clean as possible.
- 2.Determining the location of stars in the difference or out-of-transit image via PSF-type fitting to the pixel data using theKepler pixel response function (PRF), described in § 3.3, which is more robust against noise than photometric centroids.
- 3.
3.2. Construction of In- and Out-of-Transit and Difference Pixel Images
Our goal is to measure the location of the change in the flux due to the transit signal. Therefore, we want to create a difference image by subtracting pixel flux in transit from pixel flux near transit. We want to avoid pixel flux away from the transit so changes due to stellar variability are less likely to enter into the difference image. We also want to avoid changes in flux that are not related to the transit under examination, such as spacecraft thermal or pointing events or transits due to other planets orbiting the target star in multiple systems. We minimize noise by averaging as many in- and out-of-transit measurements as possible, subject to these constraints.
In each quarter,Kepler collects about 4300 long cadences, from which in- and out-of-transit exposures need to be identified. We use the (unwhitened) transit modelMn constructed in data validation (Wu et al.2010) to select these cadences.
In-transit cadences are defined as those cadences where the model is less than a threshold proportional to the model transit depth. The current threshold is 3/4 of the transit depth: when the model is normalized so thatMn = 0 for out-of-transit cadences, in-transit cadences are those for which the model valuesMn < -(3/4)d, whered is the modeled fractional transit depth.
The out-of-transit cadences are chosen near each transit under the following criteria:
- 1.Out-of-transit cadences are chosen on both sides of the transit so that an average of these out-of-transit cadences removes any locally linear secular trends.
- 2.Not too many cadences are chosen so that nonlinear variability on time scales longer than the transit are small.
- 3.Out-of-transit cadences should not be too close to the transit.
The number of out-of-transit cadencesNout is chosen as the number of cadences that occur during the entire transit duration whereMn < 0. This is generally not the same asNin. The out-of-transit cadences are chosen to lie more thanNbuffer cadences from the cadences for whichMn < 0. Figure 14 shows an example of selected cadences for a typical transit.

Fig. 14.— Example of in- and out-of-transit cadence selection (KOI-221).Top: Transit modelMn for a selected cadence range in quarter 6. Thex-axis shows the cadences since the beginning of theKepler science operations. Thecircles at the bottom of the transit show the cadences that were chosen for the in-transit image. In the transit,Nin = 4 cadences were chosen because they are below the threshold described in the text. The circles outside the transit show the cadences chosen for the out-of-transit image. The full transit is six cadences wide soNout = 6 cadences were chosen on both sides of the transit. The out-of-transit cadences areNbuffer = 3 cadences from the transit.Bottom: Actual transit in one of the brighter pixels. Thex-axis shows the cadences since the beginning of quarter 6.
After in- and out-of-transit cadences are chosen, they are excluded if they are associated with any of the following events:
- 1.Data gaps such as Earth points and safe modes.
- 2.Cadences within a day after major spacecraft thermal events, such as recovery from Earth points and safe modes that significantly change the temperature distribution of the spacecraft and require many hours to return to thermal equilibrium.
- 3.Pointing anomalies such as attitude tweaks and loss of fine-point events.
- 4.Interference by transits from other planets in multiple planet systems. An example of such interference is shown in Figure 15.

Fig. 15.— Example of the interference with cadences chosen for a transit in the Kepler-11 system. Seven out-of-transit points to the right of the transit are excluded because of the interfering transit by the other planet candidate, which causes the entire transit to be excluded from the construction of the average pixel images.
If more than a small number of cadences associated with a transit are excluded, then the entire transit is excluded from the construction of the difference image. This threshold is currently set to zero, so if any cadences are excluded, then the entire transit is excluded. AsKepler detects longer-period transits, so fewer transits will be available, this threshold will be relaxed to one or two excluded cadences per transit.
Once the final set of transits and their in- and out-of-transit cadences are identified, the in-transit pixel values are averaged to produce the in-transit image and the out-of-transit cadences are averaged to produce the out-of-transit image. The pixel values are not whitened or otherwise detrended: we rely on the averaging described in this section to remove local secular trends. First, the average pixel values are computed for each transit; then, each transit’s averaged pixels in a quarter are averaged together to produce the final in- and out-of-transit average pixel images for that quarter. The difference image for the quarter is then the out-of-transit pixel image minus the in-transit pixel image.
3.3. Fitting the Pixel Response Function
In this section, we describe how theKepler pixel response function (PRF) (Bryson et al.2010a) is used to provide a robust, high-precision estimate of the target star and transit locations using the average out-of-transit and difference images constructed as described in § 3.2. This technique requires that the target star is several magnitudes brighter than other stars in the out-of-transit pixels and that the transit signal is sufficiently strong in the difference image. In § 3.3.2, we describe a quantitative measure of whether the average images for a given target star have the required properties. Section 3.3.1 describes various ways in which this method can be compromised and discuss mitigation strategies.
The PRF gives the long-cadence brightness of a pixel due to a star at a specified location. The PRF can be thought of as the convolution of the optical PSF with the effects of pointing, subpixel response, and system electronics. In this section, we write the PRF as a unit flux functionf(α,δ,ri,ci) so
, wherePtotal is the number of all pixels that contain flux from a star at sky coordinates (α, δ), andri andci are those pixels’ row and column coordinates. If the star has fluxb, then the value of a pixel at rowri and columnci due to that star will bepi = bf(α,δ,ri,ci), and the sum of all pixels containing flux from that star is
. [In Bryson et al. (2010a), the star location is defined in pixel coordinates rather than sky coordinates. In this paper, we include the projection from sky coordinates to pixel coordinates in the PRF functionf.]
Assume we are given a set ofP pixel valuespi with rowsri and columnsci that form a pixel image. TheP pixels need not contain all the flux from the target star, soP may be less thanPtotal. A minimize the function PRF fit to these pixels is the determination of sky coordinates (αfit, δfit) and fluxbfit that minimize the function

where σpi is the uncertainty in the pixel valuepi. This fit is performed iteratively via the nonlinear Levenberg-Marquardt algorithm (Levenberg1944; Marquardt1963). Formally, this is a three-dimensional fitting problem in the parameters α, δ, andb. The fit tob, however, can be reduced to a linear problem once the position is known, so this problem can be treated as a much faster two-dimensional nonlinear fit in α and δ. In each iteration of the Levenberg-Marquardt algorithm, the pixelspi at (ri,ci) and the fit parameters α and δ are provided to the model function. We first evaluate the uncertainty-normalizedKepler PRF at α and δ, computing
for each pixel. The fluxb is the linear least-squares fit of the input pixel valuespi to the model
, given by:

The product
is then returned by the model function. The Levenberg-Marquardt algorithm seeks the α and δ that minimizes
after several iterations. (In theKepler pipeline, this is implemented as a model function passed to the MATLAB function “nlinfit.”) Once the iteration has converged, providing (αfit, δfit), the final estimate ofb can be computed as
, where now
.
The typical implementation of the Levenberg-Marquardt algorithm returns the JacobianJ, which contains the derivatives of the model function with respect to position. To estimate the uncertainty of the fit location we need the Jacobian of the position with respect to the pixel values given by the model function. We obtain this by invertingJ, using the pseudo-inverse, to give the transformationT = (JTJ)-1JT.T is aP × 2 matrix, and the columns ofT are normalized by the pixel uncertainties:Tij → Tij/σi forj = 1, 2. Then, the PRF-fit location covariance matrix is
, where
is the pixel covariance, and the fit location uncertainties are the square root of the diagonal of
and
.
The PRF is fit separately to the difference image and the out-of-transit image. Because the fit to the difference image (αdiff, δdiff) measures the position of the transiting source and the fit to the out-of-transit image (αOOT, δOOT) measures the position of the target star, the offset of the transit source from the target is simply (Δα,Δδ) = [(αdiff - αOOT) cos δOOT,δdiff - δOOT]. Then the offset distance and uncertainty are computed as in equation (13).
In- and out-of-transit pixel images, and therefore difference images, can only be constructed on a quarter-by-quarter basis. Images cannot be combined across quarters in a useful way because
- 1.The same star will fall on slightly different pixel locations in each quarter due to pointing differences and small asymmetries in the construction of theKepler focal plane.
- 2.TheKepler PRF at the star’s location can have large changes from quarter to quarter.
- 3.The pixel aperture generally varies in both size and shape from quarter to quarter.
Two approaches to combining quarters will be described in § 3.4.
3.3.1. Systematic PRF-Fit Error
Systematic error in the PRF fit arises from primarily from two classes of sources: error in the PRF model being fit and crowding. These errors cause biases in the offset vector (Δα, Δδ). There are various ways to control systematic PRF-fit errors, so we examine these errors in detail.
Sources of PRF-fit error
PRF Model Error. The PRF model contains various sources of error (Bryson et al.2010a) which lead to a priori unpredictable bias in the PRF-fit centroid. Because the target star falls on different parts of theKepler field of view in different quarters, variation of the PRF across the focal plane causes the PRF error bias to vary from quarter to quarter.
Crowding Bias. The PRF fit is a single-star fit, and therefore assumes that the target star in the out-of-transit image and the transit signal in the difference image are the only stars present in the pixels. This is rarely the case in the out-of-transit image and sometimes not the case in the difference image due to variability of field stars. Unlike the case of photometric centroids described in § 2, the effect of crowding on the PRF fit is difficult to predict. Because field stars mostly cancel in the difference image, the crowding signal in the out-of-transit and difference images can be very different. Therefore the PRF fit to the out-of-transit and difference images can have very different biases, which leads to errors in the offset vector (Δα, Δδ). An example of a target with a large amount of crowding is shown in Figure 16.

Fig. 16.— Example of a target with large amounts of crowding (KOI-1861). The in- and out-of-transit images do not appear as a typical star, and the fact that this is due to crowding is indicated by the large number of asterisks on the image, indicating many relatively bright background stars. The difference image, on the other hand, looks much more like a star because most of the background stars in the image have cancelled out, though there is still some residual background contamination. In this case, the fit to the out-of-transit image will have a large bias relative to the target star, while the bias in the difference image fit will be much smaller. This results in a biased offset measurement of the transit source relative to the target star. Visual inspection of the difference image, however, indicates that the transit source is closer to the target star than the biased measurement would indicate.
In the worst case, there is a field star in the out-of-transit image brighter than the target star, so the PRF fit to the out-of-transit image returns the centroid of the field star rather than the target star. When this bright field star cancels in the difference image, so the difference image is dominated by a transit on the target star, the offset vector (Δα, Δδ) gives the distance of the transit signal from the field star rather than the target star. The result is an incorrect measurement of a significant offset of the transit source from the target star. An example of this situation, KOI-1860 (discussed in § 2.2.1), is shown in Figure 17.

Fig. 17.— Example of a target with bright field star that captures the out-of-transit PRF fit (KOI-1860). The out-of-transit image is dominated by the bright star in the upper right corner, so this field star position will be returned by the PRF fit to the out-of-transit image. The difference image, however, shows a nicely star-shaped pattern at the location of the target star, so the target star position will be returned by the PRF fit to the difference image. The resulting offset vector measures the distance of the transit source (target star in this case) to the bright field star rather than the distance of the transit source to the target star. In this case, blindly using the offset values would lead to the erroneous identification of a background false positive.
Mitigation of the impact of PRF-fit error within a quarter
Average out-of-transit and difference images are computed for each quarter, and these are fit by the PRF to estimate the offset of the transit source from the target star. PRF model error and crowding contribute systematic errors in this estimate. Here, we discuss ways to mitigate these systematic errors within each quarter. In § 3.4.1, we discuss ways of possibly averaging out these systematics across quarters.
TheKepler PRF for nearby stars will be very nearly the same, so the PRF model error for those stars will be similar. Assuming low crowding, the PRF fit of the out-of-transit image and the fit to the difference image will have similar biases due to PRF model error. When forming the offset vector (Δα, Δδ) as the difference between these two fits, these biases should approximately cancel. We therefore prefer the offset vector computed as the difference between the two out-of-transit fits when the target star is not highly crowded.
When the target star is highly crowded, crowding bias will dominate the out-of-transit PRF fit but rarely the difference image PRF fit. This bias is usually due to an error in the measurement of the target star position. As an alternative, we compute the transit source offset relative to the target star’s catalog position. We define (Δα,Δδ)catalog = [(αdiff - αcatalog) cos δcatalog,δdiff - δcatalog], where (αcatalog, δcatalog) is the catalog position of the target star (usually from theKepler input catalog). When (Δα, Δδ) differs from (Δα,Δδ)catalog by more than aKepler pixel (3.98″), the out-of-transit measurement of the target star position (αOOT, δOOT) likely contains large errors, and the offset vector (Δα, Δδ) should be considered unreliable. The catalog-based offset error (Δα,Δδ)catalog can be used instead, but is itself subject to error because (a) it does not mitigate fit error due to PRF error and (b) is subject to catalog errors due to, for example, unknown proper motion of the target star. In this case, the PRF fit results should be considered qualitative and to have lower accuracy than noncrowded targets, regardless of the formal propagated uncertainty. In the example in Figure 17, the magnitude of the offset vector in that quarter is about 11″, while the magnitude of the offset from the catalog position is about 0.6″.
A forthcoming paper (Bryson & Morton 2013, in preparation) will describe the use of modeling to identify and mitigate bias due to crowding.
In the majority of cases, the bias will be due to a mix of crowding and PRF model error, with comparably small contributions from each. In this case, we reduce the overall bias by taking advantage of the variation in bias across quarters via averaging, as described in § 3.4.
3.3.2. PRF-Fit Quality
The quarterly out-of-transit and difference images can be polluted by various types of contamination. For example, the out-of-transit image may have bright stars in addition to the target star. The difference image may have more than one stellar image due to the variability of a field star, or the transit may have low S/N, causing the difference image to be poorly formed, as in Figure 13. These cases will degrade the reliability of the PRF-fit source offset measurement. The quality of the PRF fit can be determined by evaluating the PRF at the fit position, creating a synthetic pixel image containing only one star at that position, and comparing this to the observed average pixel image. This synthetic image will have the pixel values
(
), where the subscript “fit” refers to “diff” or “OOT,” as appropriate. These can be compared to the actual pixel valuespi to determine if the fitted PRF reproduces the observed pixels. One simple comparison is to compute the correlation between
andpi and declare the fit good if this correlation is above some threshold. For the difference image fit quality, we set the threshold to 0.7. When the correlation is below this threshold, then the difference image is likely dominated by noise, typically because the transit has a very low S/N. When the correlation is below threshold for the out-of-transit fit, then it is likely that there is more than one bright star in the image, which compromises the fit due to crowding. In both cases, the source offset measurement is likely to be unreliable.
3.4. Combining Quarterly Results
A comparison of PRF-fit star positions with their catalog R.A. and decl. show that the combination of crowding and PRF error bias has an approximately Gaussian distribution with a median of 1millipixel (0.004″) and a median absolute deviation of 22 millipixels (0.09″) (Bryson et al.2010a). While the quarter-to-quarter variation in the PRF fit of a particular star can have larger spreads, we find that for most stars this quarter-to-quarter variation is approximately zero-mean on average. We therefore combine the quarterly offsets to improve the precision of the PRF-fit centroid offset vector.
3.4.1. Multiquarter Averaging
We denote the single-quarter PRF-fit offset vectors by (Δαq, Δδq), whereq labels the quarter. A simple average ofQ quarters,
with its uncertainties
can be used, but this has the weakness that the uncertainties do not reflect scatter in the quarterly averages. For example, a set of points on a large circle with some uncertainty will have the same average and average uncertainty as a set of points with the same uncertainty that all lie at the center of the circle. We would like the uncertainty to reflect the scatter of the quarterly offsets.
We accomplish this by treating the quarterly offset vectors and their uncertainties as a time series and compute the average offset (
,
) by robustly fitting this time series with a constant. In other words, we compute a least-squares robust fit of a 0th-order polynomial to the quarterly data, minimizing

We compute a robust fit to suppress statistical outliers in the belief that these are due to transient biases resulting from systematic events such as pointing or thermal anomalies. The uncertainties in the above fit are typically returned by the robust fit algorithm used to compute
,
. Care must be taken when estimating these uncertainties a priori from the quarterly data because every fourth quarter the spacecraft orientation is strongly correlated.
The above estimate of the average uncertainty assumes Gaussian statistics. While PRF-fit biases appear nearly Gaussian in the statistical sense, they may not be Gaussian for individual targets. We therefore compute an alternative uncertainty via bootstrap analysis, which provides a more general estimate of the uncertainty. We use a resample-with-replacement strategy, creating an ensemble ofQ2 simple multiquarter averages. Specifically, given the set ofQ measured offsets (Δα1,Δα2,…,ΔαQ),Q2 realizations are created, where in each realization we replace each element with an offset randomly chosen from the measured set. Examples of these realizations whenQ = 5 include (Δα3, Δα1, Δα5, Δα4, Δα2) and (Δα2, Δα4, Δα1, Δα4, Δα1). Averages are computed for each of these realizations, and the standard deviation of the resulting ensemble ofQ2 averages provides the bootstrap uncertainty estimate. The bootstrap uncertainty is typically very similar to the uncertainty returned by the robust fit described above, but can be significantly different for specific targets. We choose the larger of the two uncertainty estimates as the final uncertainty estimate for the multiquarter average σΔα. A similar analysis applies to σΔδ.
Examples of this multiquarter averaging technique are shown in Figures 1822. Figure 18 shows a case with no significant offset, while Figure 19 shows a case with a significant offset, indicating that the transit signal is on a background star. For long-period transiting planets, where there are few quarters that contain transits, the benefits of multiquarter averaging will diminish. In such cases, however, multiquarter averaging can often provide good results, an example of which is shown in Figure 20. Figure 21 shows the low S/N example discussed in § 3.1, where we see that there is a large scatter in the quarterly measurements, but the multiquarter average is within three standard deviations of the target star.

Fig. 18.— Example of multiquarter offset analysis when the transit signal seems to be on the target star (KOI-221). In both figures, thex- andy-axes give the offsets Δα and Δδ, with (0, 0) being the catalog location of the target star. Thegreen crosses show the individual quarter offsets labeled by quarter, and the length of the crosses are equal to the uncertainties σΔα and σΔδ. The location of the multiquarter average (
,
) is shown as a magenta cross (obscured by the tight cluster ofgreen crosses). Theblue circle has radius equal to 3 times the uncertainty in the magnitude of (
,
). Star locations relative to the target star are shown asasterisks, with the target star inred (there happen to be no other stars in this figure). The KIC number andKepler magnitudes are shown next to each star. We see that most offsets are tightly clustered within 0.1″ of the target star with Q1 and Q2 as outliers.Left: Offsets (Δα, Δδ) relative to the PRF fit to the out-of-transit centroid.Right: Offsets (Δα,Δδ)catalog relative to the catalog position of the target star. The difference between theleft andright plots is not a simple translation because the two plots have different biases due to PRF error and crowding (see § 3.3.1).

Fig. 19.— Example of multiquarter offset analysis when the transit signal seems to be on a different star than the target star (KOI-109). The quarterly offsets are tightly clustered around the star KIC 4752452, indicating that this star is the source of the transit. See the caption to Figure 18 for a description of these plots.

Fig. 20.— Example of multiquarter offset analysis for a confirmed planet signal (Kepler-22b) with a very long period orbit, so only four quarters show transits. The result is a larger scatter and higher average uncertainty compared to the case where there are transits present in every quarter. Also, there is a significant difference in the offsets relative to the out-of-transit centroid in theleft panel and relative to the target star’s catalog position in theright panel. This is likely due to a combination of not-fully-averaged PRF bias and catalog error. If this planet were not confirmed by other methods (Borucki et al.2012) we would have only moderate confidence that the transit signal is on the target star. See the caption to Figure 18 for a description of these plots.

Fig. 21.— Example of multiquarter offset analysis for a low S/N transit signal (KOI-2949) with S/N = 11. In this case, the quarterly offsets have a large scatter measured in arcseconds, but the average across quarters is within three standard deviations of the target star. See the caption to Figure 18 for a description of these plots.
The case of KOI-1860, where a bright field star at the edge of the captured pixels introduces large systematic error, is examined in Figure 22. The offset relative to the out-of-transit centroid is measured to be about 4″, which is a statistically significant 4σ. For most quarters, particularly those which would show a larger offset, the PRF fit to the out-of-transit image failed because the bright star falls very close to the edge of the captured pixels. The offset relative to the catalog position, however, is much smaller, with a multiquarter average of about 0.3″ or 1σ. Because we are aware of the bright star crowding for KOI-1860, we defer to the offset relative to the catalog position, which is not statistically significant.

Fig. 22.— Example of multiquarter offset analysis for a target star (KOI-1860, also discussed in § 2.2.1) whose pixels contain a brighter field star (see Fig. 17). The offsets relative to the out-of-transit centriod are large because the bright star captured the out-of-transit PRF fit. The out-of-transit PRF fit also failed in many quarters because the bright star is at the edge of the pixel aperture. The offsets relative to the target star’s catalog position are, however, well clustered around the target star, indicating that the offset of the transit is not statistically significant. We therefore conclude that the large offset relative to the out-of-transit centroid is due to systematic effects from the bright field star in the pixels. See the caption to Figure 18 for a description of these plots.
We demonstrate the increased precision of the multiquarter average in Figure 23. The offset distance from the target catalog position is shown for both individual quarter PRF fits and their quarterly average. This analysis uses 2278 KOIs whose quarterly averaged offsets are less than 3σ and whose offsets from the target are < 5″ in the Q1–Q12 data. The left panel shows the 21,401 individual quarter offsets, while the right panel shows the offset of the average over all quarters for each target. The individual quarter offsets have a standard deviation of 0.90″, while the multiquarter averages over 12 quarters have a standard deviation of 0.41″. Strong year-to-year correlations prevent the standard deviation from scaling as
, but do not prevent an improvement asQ increases.

Fig. 23.— Distributions of the PRF-fit offset from the target catalog position for 2278 KOIs whose quarterly averaged offsets are less than 3σ and whose offsets from the target are < 5″.Left: Distribution of individual quarter offsets.Right: Distribution of the multiquarter averages.
Figure 24 shows how the standard deviation depends on the number of quarters averaged. We see that adding a quarter always statistically increases the precision of the multiquarter average, though this may not be the case for every individual target.

Fig. 24.— Standard deviation of the multiquarter average as a function of the number of quarters used in the average. Thex-axis shows quarters used, where for each point the average is taken for the transits found in quarters 1 through thex-axis value.
3.4.2. Joint Multiquarter PRF Fit
When the transit S/N is very low, there may not be enough signal in each quarterly difference image to support per-quarter PRF fitting. In this case, we perform a joint multiquarter fit, where the pixel images for all quarters are supplied to the PRF fitter, and the single R.A. and decl. (and quarter-specific PRF amplitude) are found that minimize the pixel-level difference between the pixel images and PRF-reconstructed pixels over all quarters. In other words, the joint multiquarter fit finds the single sky position (α, δ) that minimizes the function

where the subscriptq means the quarter-specific values of each quantity. So, in each quarter the flux-normalized PRFbqfq for that quarter is evaluated at (α, δ) (which is common to all quarters) for that quarter’s pixels (ri,q,ci,q). These PRF-based pixel values are subtracted from the observed pixel valuespi,q for each quarter. The square of this difference normalized by the uncertainty is summed over all the pixels in that quarter, and, finally, summed over all quarters producing the test χ2 value. The sky position is varied until the (α, δ) that minimize χ2 is found. The details of the computation in each quarter are similar to the single-quarter fit in § 3.3.
The propagated uncertainty in this fit does not account for scatter across quarters due to systematic error, so it dramatically underestimates the actual uncertainty in this fit. We compute a more accurate uncertainty via a bootstrap approach much like that for the multiquarter averages described in § 3.4.1, except the data consist of pixel images rather than offsets and each element of the ensemble is a joint PRF fit. Specifically, the multiquarter PRF fit takes as input the set of pixel images (I1,I2,…,IQ) constructed in § 3.2, whereIq is the pixel image for each quarter. The bootstrap approach creates an ensemble of resamplings-with-replacement sets of pixel images, for example (I4,I5,I3,I2,I2) ifQ = 5. The multiquarter fit is performed on each element of the ensemble, computing a best fit (α, δ) for each one. Each element of the ensemble is fit with the parameters from the quarter for that component. For example, if the first element of the ensemble isI4, then the PRF from quarter 4 is applied to those quarter 4 pixels. The uncertainty in the joint multiquarter fit is then set to the standard deviation of the ensemble of fit positions.
The size of the resampled ensemble needs to be chosen with care. The time to compute the joint multiquarter fit scales with the number of quartersQ. If the usual choice ofQ2 were chosen for the size of this ensemble, the full computation of the joint fit and its uncertainties would scale asQ3. In theKepler pipeline, a bootstrap joint fit of 8 quarters took about 20 minutes, which indicates that a 16-quarter fit would take almost 3 hr. It is prohibitive to run this on all 15,000 to 20,000 threshold crossing events identified by the pipeline. The joint PRF fit is therefore not routinely run in theKepler pipeline, but is reserved for low S/N transits for which the multiquarter average does not provide a sufficiently precise result. The possible use of a smaller resampled ensemble is under investigation.
4. PIXEL CORRELATION IMAGES
The “pixel correlation method” computes the degree to which the transit signal over time appears in each pixel. This information is used to create a pixel image, where the value of each pixel is the degree of correlation between the pixel flux and the transit signal. This image is centroided via PRF fitting similar to the difference image method. This method has a different response to nontransit photometric variability from the photometric and difference image methods, so it can be useful for resolving cases when the other methods provide ambiguous results.
The correlation between the pixel-level flux and the transit signal over time is computed via a fit of the transit model to the individual pixel flux time series. This uses the same fitting method described in § 2.2, with the centroid time series replaced by the pixel flux time series. In this case the fit constant γ is a measure of the presence of the transit signal in each individual pixel. An example of these fits is shown in Figure 25. A “pixel correlation image” can be constructed by setting the value of each pixel to its model fit value γ. When this is done for the example in Figure 25, we get the pixel image in the left panel of Figure 26. The right panel of Figure 26 shows an example where the transit signal is offset from the target star. For such high S/N targets, the transit signal is readily apparent in the pixels, and the correlation image has a star-like appearance. In these cases, the photometric or PRF centroiding can be applied to quantitatively and automatically compute the location of the transit, which can be compared to the catalog position of the target star or the target star location from the PRF fit to the difference image.

Fig. 25.— Fits of the transit model to individual pixel flux time series for KOI-221 in quarter 7. The pixel flux time series is shown in blue and the transit model is in red. Each pixel flux time series is detrended and folded on the transit period. A closeup of the transit event is shown, with the same time interval on allx-axes. They-axes show the pixel values and are scaled to show the variation in each pixel time series. The pixel rows are shown along the left and pixel columns are shown along the bottom. The pixels that strongly contain the transit signal indicate the location of the transit source.

Fig. 26.— Correlation images, created by assigning each pixel the scale factor that multiplies the transit model to best fit that pixel’s flux time series.Left: Example from Figure 25 of the transit signal being coincident with the target star (KOI-221).Right: Example with the transit signal significantly offset from the target star (KOI-109). In these figures, the smallwhite squares indicate pixels for which the fit scaling is above a threshold.
When the transit has low S/N or the pixels have significant flux from other sources, the pixel correlation image can be of much lower quality. Two examples of this situation are shown in Figure 27.

Fig. 27.— Correlation images for more problematic transits.Left: Example where there is a field star in the aperture brighter than the target star (KOI-1860). Variability of the bright star pollutes the correlation image, but the transit signal is still apparent.Right: Low S/N example (KOI-2949) with S/N = 11. For such low S/N transits, the transit signal is barely discernable in the individual pixel time series, which causes the correlation image to be dominated by background variability and pixel-level systematics.
Because the correlation image is degraded by background flux and can have poor behavior at low S/N, it is not generally used for false-positive identification. There are circumstances, however, where the correlation image can be used in combination with the other methods to make a determination. For example, some low S/N targets have marginal difference and correlation images, but if they show the transit signal in the same pixel location then we have increased confidence that the transit signal in those pixels is real.
5. SATURATED TARGETS
Target stars withKepler magnitudes brighter than ∼11.5 can exhibit saturation, where the flux in a pixel exceeds that pixel’s full well and spills up and down the pixel columns (Caldwell et al.2010). The result is that the pixel image of the star can be highly distorted, invalidating all of the centroid methods described in this paper. Saturation can be highly asymmetric, so even photometric centroids are of limited use. Visual inspection of the difference image can, however, reveal large, multipixel offsets, indicating that the transit is not on the saturating star.
When the saturated star is the transit source, the difference image will have a distinctive, non-star-like pattern. Because the saturation spills along columns and the amount of spill is approximately proportional to the flux of the star, a transit signal on a saturated star will appear in the difference image as changes at the ends of the saturated columns. An example is shown in Figure 28. This is a characteristic pattern in the difference images for saturated targets. All that can be said in this case is that the transiting source is in approximately the same column position as the target star, between the ends of the saturation. If the transit were due to a field star that is not in the saturated pixels, the difference image would show that star and not the signal from the saturated pixels.

Fig. 28.— Example of a transit signal on a saturated star for the confirmed planet Kepler-21b (Howell et al.2012). The host star hasKepler magnitude = 8.4 and is highly saturated. In the difference image, the transit is apparent in the pixels at the end of the saturation in columns 612 and 613 (the star labels have been removed from the difference image for clarity). The target star is near the boundary between these two columns, which is why there is about equal saturation in both columns. Note the strong asymmetry in the saturation for this quarter, with the saturation going up the columns significantly further than down.
Special investigation of saturated targets can sometimes refine the location of the transit signal. The appearance of the transit at the end of the saturated columns is sensitive to the column position of the transiting source. If the transit S/N is high enough, the wings of the transits can be subject to a PRF fit while masking out the saturated columns. These techniques have been applied with some success, identifying the location of the transit signal to within 4″, for Kepler-21b (Howell et al.2012). We refer the reader to that publication for details.
6. PERFORMANCE AND COMPARISON OF TECHNIQUES
In this section we examine the performance of our transit-source location estimation via photometric and PRF-fit centroids. We focus on offset distances because that is the high-level metric used in initial false-positive identification. We examine three populations of targets:
- 1.AllKepler objects of interest (KOIs) dimmer thanKepler magnitude 11.5 [to avoid saturated targets (Caldwell et al.2010)], which have well-defined transit-like signals of sufficient quality to pass vetting and produce an ephemeris and valid PRF fits (4049 KOIs). Many of these KOIs are in multiple systems.
- 2.Unsaturated KOIs that have been identified as being due to transit sources that are unlikely to be on the target, called active pixel offsets (APOs), that have valid PRF fits as of 2012 July (178 KOIs).
- 3.A small number of APO KOIs whose transit signals have been identified with stars in theKepler input catalog (16 KOIs).
In this section we focus on the following questions:
- 1.How well do the methods identify the location of these sources?
- 2.Is there evidence that the source locations correspond to a uniform distribution of background sources?
- 3.How do these methods compare with one another with respect to accuracy and precision?
We also address an issue that arises with high-transit-S/N targets, where offsets can be very small but the formal uncertainty can be much smaller. In this situation, we encounter residual bias that is not accounted for in the uncertainty, which causes offsets to incorrectly seem statistically significant.
6.1. Accuracy
We use APO targets whose transit signals have been associated with known stars to measure how accurately our two primary methods of photometric and PRF-fit centroids identify the source location. This association is determined by manual investigation of the difference images independently of the offset computations. We see in Figure 29 that the PRF estimate of the transit source offset is close to the star identified as the transit signal source. For APOs with small offsets (< 4″) the photometric centroids also have good accuracy. For APOs with larger offsets, however, photometric centroids show large errors. This behavior is expected because theKepler pipeline uses one set of pixels to estimate the depth of the transit signal and a larger set of pixels to compute the photometric centroid. As described in § 2.3.1, when the transit source has significant flux that falls outside the pixels used for the depth estimate, which is the case when the source is more than 4″ from the target star, there can be significant error in the transit source location inferred from the photometric centroids.

Fig. 29.— Left: Distance of the PRF-fit and photometric centroids from known stars that are likely to be the source of confirmed APO transit signals (y-axis) vs. the distance of the known star from the target star (x-axis).Right: Same stars, showing the offset of the centroid from the target star (y-axis). The PRF offsets are relative to the target star catalog location for consistency with the photometric offsets.
Figure 30 compares the PRF-fit and photometric centroid source offset estimates for all KOIs and shows that the photometric centroid estimate of the source offset is generally (but not always) larger than the PRF-fit estimate when the PRF-fit source location is more than a few arcseconds from the target.

Fig. 30.— Left: Comparison between the PRF-fit offsets (x-axis) and the photometric centroid source offsets (y-axis) from the target star catalog position.Right: Ratio of PRF-fit offsets/photometric centroid source offsets (y-axis) vs. magnitude of the PRF-fit offsets (x-axis). APO KOIs are marked bycircles. Thered line in both figures indicates equality between the PRF-fit and photometric offsets. We see that the photometric centroid estimate of the source distance agrees with the PRF estimate for distances of a few arcseconds from the target star. As expected, the photometric centroid usually overestimates the offset for transit sources that are further from the target star (see § 2.3.1).
Figure 31 compares the PRF-fit source offset relative to the target star catalog position with the PRF-fit source offset relative to the out-of-transit PRF-fit centroid. These two offsets are similar for the majority of stars, with outliers that are likely due to bias due to crowding.

Fig. 31.— Comparison of the PRF-fit source offset relative to the PRF fit to the out-of-transit pixel image (x-axis) and the PRF-fit source offset relative to the catalog position of the target star. APO KOIs are marked bycircles. We see that most targets with large offsets cluster along the diagonal, indicating that the two offsets are generally in reasonable agreement. Outliers are likely due to crowding issues.
Figure 32 compares the distribution of the APO KOIs and the distribution of observed pixel area relative to target stars. The fact that these two distributions have similar shapes with similar peaks is consistent with the identified APOs representing a uniform background of eclipsing binaries and possibly large planetary transits. This consistency contributes to our confidence that the APOs are correctly identifying astrophysical false positives.

Fig. 32.— Left: Distribution of PRF-fit source offsets for targets identified as APOs. There is a strong peak at about 6″–7″. This distribution is strongly dependent on the pixel aperture associated with each target star, which limits the offset that can be detected.Right: Distribution of pixel area as a function of distance from the target star associated with each pixel, across theKepler field of view. This distribution also has a peak at about 7″. The similarity between these two distributions is consistent with the identified APOs, representing a uniform distribution of background sources such as eclipsing binaries and large transiting planets.
6.2. Precision Versus S/N
The precision of a centroid measurement is dependent on the strength of the transit signal in each pixel. This strength depends on the transit depth, host star brightness, and number of transits, among other factors. All of these factors contribute to the transit S/N, so we analyze precision as a function of transit S/N. Figure 33 shows the dependence of formal centroid source offset uncertainty on transit S/N. Both the PRF-fit and photometric centroid methods show similar dependencies, though the uncertainties for the PRF-fit centroid method are somewhat smaller. A linear fit to the log-log data gives the uncertainty of the two methods as

These fits, along with the range of values implied by the 1-σ uncertainties in the fit parameters, are shown in Figure 34. The uncertainty of the photometric centroid method is inversely proportional to the S/N, as expected, while the PRF-fit method has a somewhat smaller dependence on inverse S/N. The coefficient of these uncertainties (13.6 for photometric uncertainties and 3.39 for the PRF fit) is larger than the fullwidth at half-maximum expected for centroid uncertainties because these uncertainties include contributions from the offset computation. The uncertainties reported in this section are propagated formal uncertainties, however, which are only valid if all noise sources are zero-mean Gaussian white noise. As described in this paper, there are several sources of systematic error that impact transit source offset estimation. These systematic errors are not reflected in the formal uncertainty.

Fig. 33.— Formal offset uncertainty vs. transit S/N for PRF fit (left) and photometric (right) centroids using 12 quarters of data. Thered dashed line in both figures shows the 1/(S/N) dependency for comparison. We see that the precision of the PRF-fit offsets is somewhat better on average than the photometric centroid offsets. This precision does not account for bias due to systematic error for either type of centroid.

Fig. 34.— Uncertainty vs. S/N from the fits in Figure 33, plotted on linear scales. Thedotted lines indicate the range of variation due to the 1-σ uncertainties in the fit parameters.
Because the dependence of the PRF-fit and photometric centroid estimates of the source offset on S/N have similar log slopes we expect that if one technique indicates a significant offset then the other technique will as well. This is shown in Figure 35, which indicates that for most targets the photometric centroid and PRF-fit methods are in agreement as to whether there is a significant offset for a particular target. But there are many targets, including a few identified APOs, that have photometric centroid source offsets < 3σ but PRF-fit source offsets >3σ and vice versa.

Fig. 35.— Comparison of the PRF-fit source offset relative to the catalog position of the target star (x-axis) and the photometric centriod source offset (y-axis), both in units of σ. Thevertical andhorizontal lines mark where the offset = 3σ, above which the offset is considered statistically significant. APO KOIs are marked by circles. We see that most targets have both offsets below 3σ, but there are a significant number of targets for which the photometric centroid source offset is less than 3σ but the PRF-fit offset is >3σ and vice versa.
Quantitatively, for 54.9% of all KOIs the two techniques are in agreement that the source offset is < 3σ; 24.7% of all KOIs have agreement that the source offset is >3σ; 13.9% of all KOIs have offsets >3σ according to the PRF-fit technique but < 3σ according to photometric centroids; and 6.45% of all KOIs have offsets < 3σ according to the PRF-fit technique but >3σ according to photometric centroids. Therefore, the two methods are in agreement on significance for about 80% of the targets. Most of the targets for which the PRF-fit techniques indicate an offset >3σ but the photometric centroids have a shift < 3σ have very small PRF-fit offsets, so they are at distances where residual bias dominates, as discussed in § 6.3.
The results described in the previous paragraph should only be taken as a comparison of the photometric centroid and difference image techniques, rather than a statistical measurement of the APO population in theKepler data. When both the difference image and photometric centroid method agree that there is a significant offset, this offset is likely to indicate an APO due to a background false positive, and each individual case must be examined to assure that the offset is not actually due to the systemic errors described in this paper. When one of the methods indicates a significant offset but the other does not, it is less likely that the offset is due to a background false positive rather than systematic error. However, an approximately 25% significant APO rate is consistent with the observed APO rate described in § 1, averaged over theKepler field of view.
6.3. Residual Bias and High S/N Transits
As described in § 3.3.1, the computation of the PRF-fit source offset is subject to various kinds of bias due to PRF error and crowding. When the transit S/N is high, both centroid methods will have very high formal precision with very small uncertainties. The PRF-fit source offset estimate essentially hits a noise floor, where the offsets are dominated by residual biases. Figure 36 shows that this noise floor begins to be apparent at source offsets of about 2″, where there is a noticeable increase in objects with offsets between 3σ and 4σ. Below about 0.2″ there is a large excess of objects with large offsets in units of σ. The right panel of Figure 36 shows targets with high S/N. In this population, offsets are mostly very small, and we find most of the large excess of high-σ offsets. We interpret this to mean that residual biases in the PRF-fit source offset are dominant under 0.2″.

Fig. 36.— Relationship between the PRF-fit source offset (x-axis) and source offset in units of sigma (y-axis).Left: All KOIs.Right: KOIs with transit S/N > 100. On theleft, we see that for offsets < 3″ there seem to be an excess of targets with offset >3σ (red line). On theright, we see that for high S/N targets the offset is small, but there is an excess of targets with offset >3σ. This is likely due to residual bias from the errors discussed in § 3.3.1.
Figure 37 shows a similar analysis for photometric-centroid-based source offsets. The excess of significantly offset targets is apparent but less severe in this case.

Fig. 37.— Relationship between the photometric centroid source offset (x-axis) and source offset in units of sigma (y-axis).Left: All KOIs.Right: KOIs with transit S/N > 100. Many KOIs fall outside the plot, but our interest is in small offset behavior. On theleft, we see that for offset < 0.2″ there seem to be an excess of targets with offset >3σ (red line). On theright we see that for high S/N targets the offset is small, but there is an excess of targets with offset >3σ.
We mitigate the impact of residual bias on small offset/high S/N targets in PRF-fit estimates of the source offset in two ways:
- 1.Adding a small constant “noise floor” to reflect the residual bias. Because bias seems to dominate at less than 0.2″, we want to avoid classifying any target with a source offset less than 0.2″ as an APO false positive. Because this classification is based on a 3σ threshold we add σ0 = (0.2/3) arcseconds in quadrature to the formal uncertainty in each component:
,
. (This has the same effect on the offset distance uncertainty σD as adding σ0 to σD in quadrature.) The impact of adding this noise floor is shown in Figure 38.
Fig. 38.— Effect of adding a small constant to the PRF-fit source offset uncertainty on the relationship between the PRF-fit source offset (x-axis) and source offset in units of sigma (y-axis).Left: All KOIs from Figure 36.Right: Same targets with a constant (0.2/3) arcseconds added to the formal uncertainty in quadrature. The excess of targets exceeding 3σ at offset < 0.2 sigma has been removed.
- 2.Special treatment is given to vetting targets with small source offsets. An example simple set of rules for manual vetting for false positives is the following:
- A.Pass all targets with offsets < 0.2″ (this happens automatically when using the above noise floor).
- B.For targets with offsets < 1″, manually investigate those targets with offsets >3σ.
- C.For targets with offsets between 1″ and 2″, manually investigate those targets with offsets between 3σ and 4σ.
- D.For targets with offsets between 1″ and 2″, declare as APO targets with offsets above 4σ.
- E.For targets with offsets >2″, declare as APO targets with offsets above 3σ.
7. CONCLUSIONS
Many background astrophysical false positives can be identified through centroid analysis ofKepler pixel data. The high photometric precision of theKepler data provides opportunities to identify such objects close to the target star, but great care must be taken to account for various systematic biases. We have presented three different techniques, two of which were analyzed in detail. This ensemble provides a power arsenal of tools for dispositioning nearly all KOIs.
The PRF-fit technique provides the best accuracy in the localization of transit sources that are not on the target star. The photometric centroid technique behaves best when the target star is isolated and the transit source is close to (or is) the target star. The photometric centroid technique is therefore useful for confirming that the transit is on the target star when this is also indicated by the PRF-fit technique. The photometric centroid technique can indicate when the transit source is separated from the target star, but when the separation is more than a few arcseconds, the source location determined by photometric centroids is unreliable.
When the S/N is low or there is significant crowding, the PRF technique can break down. In this case, the photometric technique may provide the best evidence that the centroid is on the target star. The pixel correlation images can also be useful in this circumstance, though the pixel correlation technique is fragile.
We find that we often use all three techniques when investigating a difficult target. This toolbox of techniques is a critical component of theKepler planet candidate vetting process and makes a significant contribution to the reliability of theKepler planet candidate list.
We gratefully acknowledge the outstanding work of the entireKepler team that performs the data acquisition and analysis and delivers the precision that makes the techniques described in this paper possible. We particularly thank theKepler Science Operations Center and Science Office for their support and creativity while these techniques were being developed. We thank Martin Still, Susan Thompson, and Jeff Coughlin for valuable comments on early drafts of this paper. Finally, we thank Bill Borucki, Ted Dunham, Dave Latham, Nick Gautier, and the widerKepler science community for constant support and encouragement.
Kepler was competitively selected as the tenth Discovery mission. Funding for this mission is provided by NASA’s Science Mission Directorate.
APPENDIX: DERIVATION OF THE FORMULA RELATING CENTROID SHIFTS TO TRANSIT SOURCE LOCATION
Assume that we are observing a target star with fluxb0 at (α0, δ0), withN nearby stars at R.A. and decl. (αj, δj),j = 1,…N, and fluxbj. Assume the stark, withk ≠ 0, is a background eclipsing binary with fractional eclipse depthdback [so the flux of stark in mid eclipse is (1 - dback)bk]. We model the PSF of the star with a functionf(α,δ) that has the following properties, where the integral is taken over the domain wheref > 0:
- 1.f(α,δ) has finite support (f = 0 outside of a finite area).
- 2.∫f(α,δ)dαdδ = 1. In other words,f has unit flux, sobjf has the total flux

- 3.∫αf(α - αj,δ - δj)dαdδ = αj and ∫δf(α - αj,δ - δj)dαdδ = δj; so example,

so the centroid of an isolated star is the same as that star’s position.
We now consider an aperture on the sky that may not completely capture all flux from stars in the aperture and may contain flux from stars outside the aperture. Therefore, ∫apbjf(α,δ)dαdδ ≠ bj, ∫apαf(α - αk,δ - δk)dαdδ ≠ αk, and ∫apδf(α - αk,δ - δk)dαdδ ≠ δk, where ∫ap denotes an integral over the aperture. We model the background flux as an arbitrary functionB(α,δ). We denote the total flux in the aperture by:

To simplify the following discussion, we define the notation:



So
is the flux from starj in the aperture,Bap is the background flux in the aperture, and the superscript α or δ indicates the first moment in R.A. or decl. of these quantities. Then
.
The out-of-transit centroid (including all flux in the aperture) is given by

The in-transit centroid is given by


The observed depth is defined so that the observed flux in mid eclipse is (1 - dobs)Fap. Assuming that the eclipse is the only cause of a change in flux, the observed flux in mid eclipse is also given by
. Therefore,
, so
.
The centroid shift is given by

We define

which are the R.A. and decl. of the centroid of the flux of the transit sourcek in the aperture when all other flux is absent (alternatively, this is the centroid of the difference image formed by subtracting in-transit pixels from out-of-transit pixels when all other flux is constant). Therefore, this centroid is given by:

The quantities (
,
) approximate the transit source location (αk, δk), with the error in this approximation decreasing as more flux from the transit source is captured in the aperture. When all flux from the transit source is captured in the aperture,
.
In theKepler pipeline implementation, the transit depth is estimated using the optimal aperture (Bryson et al.2010b), while the centroids are measured using the optimal aperture plus a one-pixel ring around the optimal aperture. This is because some optimal apertures consist of only a single pixel, which cannot be usefully centroided. This use of one aperture for centroid computation and a smaller aperture to estimate observed transit depth invalidates the conclusion of the above analysis becausedobs in equation (A1) is different from the depth
determined using the optimal aperture.
We can estimate the difference in these observed depths and predict the impact on the estimated transit source position. For the aperture used for centroiding, we have the relation
, while for the optimal aperture we have the same relation:
. Solving both relations fordbackbk and equating, we find

Because the optimal aperture is contained within the aperture used for centroiding,Fap/FoptAp > 1 while
. In the typical case where the background star is much dimmer than the target star,Fap/FoptAp will be not much greater than one, while
can be very close to zero, for example when the core of stark is in the pixel ring and only its wings are in the optimal aperture. Therefore,
can be much smaller thandobs, resulting in a significant overshoot of stark’s position in equation (A1). This overshoot is particularly likely to happen when stark is outside the optimal aperture, in other words for background stars further from the target star. When stark is brighter than stars in the optimal aperture, including the target star, the overshoot is reduced because the the flux in the aperture is dominated by the flux from stark. When stark is in the optimal aperture, the impact on equation (A1) is much less dramatic, and it can provide a very good estimate of the transiting star’s position.






