Get SMARTER samples list#
This notebook is used to get the list of samples that are used in the SMARTER dataset. Here’s another example of how to use the SheepEndPoints class to get the list of samples:
import pandas as pd
from tskitetude import get_data_dir
from tskitetude.smarterapi import SheepEndpoint
Collecting all samples (no parameters are provided using the SheepEndPoints.get_samples() method:
sheep_api = SheepEndpoint()
data = sheep_api.get_samples()
page = 1
sheep = pd.DataFrame(data["items"])
while data["next"] is not None:
data = sheep_api.get_samples(page=page+1,)
df_page = pd.DataFrame(data["items"])
page = data["page"]
sheep = pd.concat([sheep, df_page], ignore_index=True)
sheep.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11837 entries, 0 to 11836
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 _id 11837 non-null object
1 breed 11837 non-null object
2 breed_code 11837 non-null object
3 chip_name 11837 non-null object
4 country 11837 non-null object
5 dataset_id 11837 non-null object
6 locations 9295 non-null object
7 metadata 9298 non-null object
8 original_id 11837 non-null object
9 phenotype 1582 non-null object
10 smarter_id 11837 non-null object
11 species 11837 non-null object
12 type 11837 non-null object
13 father_id 279 non-null object
14 mother_id 63 non-null object
15 sex 3054 non-null float64
16 alias 2059 non-null object
dtypes: float64(1), object(16)
memory usage: 1.5+ MB
Inspecting the first 5 samples of the dataframe:
sheep.head()
| _id | breed | breed_code | chip_name | country | dataset_id | locations | metadata | original_id | phenotype | smarter_id | species | type | father_id | mother_id | sex | alias | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | {'$oid': '66544dd515d202ad5f28dc8e'} | Texel | TEX | IlluminaOvineSNP50 | Uruguay | {'$oid': '604f75a61a08c53cebd09b67'} | {'coordinates': [[-54.79980235632396, -32.8596... | {'gps_2': 'https://www.google.com/maps/place/E... | 20181210002 | {'purpose': 'Meat'} | UYOA-TEX-000000001 | Ovis aries | background | NaN | NaN | NaN | NaN |
| 1 | {'$oid': '66544dd615d202ad5f28dc8f'} | Texel | TEX | IlluminaOvineSNP50 | Uruguay | {'$oid': '604f75a61a08c53cebd09b67'} | {'coordinates': [[-54.79980235632396, -32.8596... | {'gps_2': 'https://www.google.com/maps/place/E... | 20181210003 | {'purpose': 'Meat'} | UYOA-TEX-000000002 | Ovis aries | background | NaN | NaN | NaN | NaN |
| 2 | {'$oid': '66544dd615d202ad5f28dc90'} | Texel | TEX | IlluminaOvineSNP50 | Uruguay | {'$oid': '604f75a61a08c53cebd09b67'} | {'coordinates': [[-54.79980235632396, -32.8596... | {'gps_2': 'https://www.google.com/maps/place/E... | 20181210005 | {'purpose': 'Meat'} | UYOA-TEX-000000003 | Ovis aries | background | NaN | NaN | NaN | NaN |
| 3 | {'$oid': '66544dd715d202ad5f28dc91'} | Texel | TEX | IlluminaOvineSNP50 | Uruguay | {'$oid': '604f75a61a08c53cebd09b67'} | {'coordinates': [[-54.79980235632396, -32.8596... | {'gps_2': 'https://www.google.com/maps/place/E... | 20181210006 | {'purpose': 'Meat'} | UYOA-TEX-000000004 | Ovis aries | background | NaN | NaN | NaN | NaN |
| 4 | {'$oid': '66544dd715d202ad5f28dc92'} | Texel | TEX | IlluminaOvineSNP50 | Uruguay | {'$oid': '604f75a61a08c53cebd09b67'} | {'coordinates': [[-54.79980235632396, -32.8596... | {'gps_2': 'https://www.google.com/maps/place/E... | 20181210008 | {'purpose': 'Meat'} | UYOA-TEX-000000005 | Ovis aries | background | NaN | NaN | NaN | NaN |
Collect some useful information about the samples into a CSV file:
sheep[["smarter_id", "breed", "breed_code", "phenotype"]].to_csv(
get_data_dir() / "smarter_sheep_list.csv", index=False)