Handling Missing Coordinates Guide

Locator provides flexible handling of samples that lack geographic coordinates (NA samples), controlled through the na_action parameter. This can be set at the instance level or overridden per method call.

Understanding NA Samples

Samples are considered to have “NA” (missing) coordinates when:

  • The x (longitude) coordinate is NaN or missing

  • The y (latitude) coordinate is NaN or missing

  • Either coordinate is missing (a sample needs both x and y to be “known”)

NA Action Modes

separate (default)

Separates samples into training (known locations) and prediction (unknown locations) sets. Training uses only known-coordinate samples; prediction includes all samples.

locator = Locator({"na_action": "separate"})

Use when: You have new samples without known locations that you want to predict.

exclude

Filters out all samples without coordinates before any analysis. Only known-location samples participate; dataset size is reduced accordingly.

locator = Locator({"na_action": "exclude"})

Use when: You only want to analyze samples with verified locations.

fail

Raises a ValueError if any samples lack coordinates. Forces you to handle missing data explicitly before proceeding.

locator = Locator({"na_action": "fail"})

Use when: You want to enforce data completeness in a QC pipeline.

Checking Your Data

Always inspect your data before analysis:

genotypes, samples = locator.load_genotypes(vcf="data.vcf")
locator.check_data(genotypes, samples, verbose=True)

This will display:

===== Data Summary =====
Total samples: 231
Samples with coordinates: 211
Samples without coordinates: 20
Total SNPs: 1000

Current NA handling mode: separate
- Will train on samples with known locations
- Can predict on samples without locations

Samples without coordinates (first 10):
- sample_X123
- sample_X124
- ...

Method-Specific Behavior

Different analysis methods handle NA samples in different ways.

Methods Supporting Full ‘separate’ Mode

These methods can train on known samples and predict on unknown samples:

  • train() / predict()

  • run_bootstraps()

  • run_windows()

  • run_jacknife()

Holdout Methods

Holdout methods require known coordinates for evaluation, so 'separate' behaves like 'exclude' – only samples with known coordinates are used:

  • run_holdouts()

  • run_k_fold_holdouts()

  • run_jacknife_holdouts()

  • run_windows_holdouts()