Get SMARTER samples list

Get SMARTER samples list#

This notebook is used to get the list of samples that are used in the SMARTER dataset. Here’s another example of how to use the SheepEndPoints class to get the list of samples:

import pandas as pd

from tskitetude import get_data_dir
from tskitetude.smarterapi import SheepEndpoint

Collecting all samples (no parameters are provided using the SheepEndPoints.get_samples() method:

sheep_api = SheepEndpoint()

data = sheep_api.get_samples()
page = 1
sheep = pd.DataFrame(data["items"])

while data["next"] is not None:
    data = sheep_api.get_samples(page=page+1,)
    df_page = pd.DataFrame(data["items"])
    page = data["page"]
    sheep = pd.concat([sheep, df_page], ignore_index=True)

sheep.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11837 entries, 0 to 11836
Data columns (total 17 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   _id          11837 non-null  object 
 1   breed        11837 non-null  object 
 2   breed_code   11837 non-null  object 
 3   chip_name    11837 non-null  object 
 4   country      11837 non-null  object 
 5   dataset_id   11837 non-null  object 
 6   locations    9295 non-null   object 
 7   metadata     9298 non-null   object 
 8   original_id  11837 non-null  object 
 9   phenotype    1582 non-null   object 
 10  smarter_id   11837 non-null  object 
 11  species      11837 non-null  object 
 12  type         11837 non-null  object 
 13  father_id    279 non-null    object 
 14  mother_id    63 non-null     object 
 15  sex          3054 non-null   float64
 16  alias        2059 non-null   object 
dtypes: float64(1), object(16)
memory usage: 1.5+ MB

Inspecting the first 5 samples of the dataframe:

sheep.head()
_id breed breed_code chip_name country dataset_id locations metadata original_id phenotype smarter_id species type father_id mother_id sex alias
0 {'$oid': '66544dd515d202ad5f28dc8e'} Texel TEX IlluminaOvineSNP50 Uruguay {'$oid': '604f75a61a08c53cebd09b67'} {'coordinates': [[-54.79980235632396, -32.8596... {'gps_2': 'https://www.google.com/maps/place/E... 20181210002 {'purpose': 'Meat'} UYOA-TEX-000000001 Ovis aries background NaN NaN NaN NaN
1 {'$oid': '66544dd615d202ad5f28dc8f'} Texel TEX IlluminaOvineSNP50 Uruguay {'$oid': '604f75a61a08c53cebd09b67'} {'coordinates': [[-54.79980235632396, -32.8596... {'gps_2': 'https://www.google.com/maps/place/E... 20181210003 {'purpose': 'Meat'} UYOA-TEX-000000002 Ovis aries background NaN NaN NaN NaN
2 {'$oid': '66544dd615d202ad5f28dc90'} Texel TEX IlluminaOvineSNP50 Uruguay {'$oid': '604f75a61a08c53cebd09b67'} {'coordinates': [[-54.79980235632396, -32.8596... {'gps_2': 'https://www.google.com/maps/place/E... 20181210005 {'purpose': 'Meat'} UYOA-TEX-000000003 Ovis aries background NaN NaN NaN NaN
3 {'$oid': '66544dd715d202ad5f28dc91'} Texel TEX IlluminaOvineSNP50 Uruguay {'$oid': '604f75a61a08c53cebd09b67'} {'coordinates': [[-54.79980235632396, -32.8596... {'gps_2': 'https://www.google.com/maps/place/E... 20181210006 {'purpose': 'Meat'} UYOA-TEX-000000004 Ovis aries background NaN NaN NaN NaN
4 {'$oid': '66544dd715d202ad5f28dc92'} Texel TEX IlluminaOvineSNP50 Uruguay {'$oid': '604f75a61a08c53cebd09b67'} {'coordinates': [[-54.79980235632396, -32.8596... {'gps_2': 'https://www.google.com/maps/place/E... 20181210008 {'purpose': 'Meat'} UYOA-TEX-000000005 Ovis aries background NaN NaN NaN NaN

Collect some useful information about the samples into a CSV file:

sheep[["smarter_id", "breed", "breed_code", "phenotype"]].to_csv(
    get_data_dir() / "smarter_sheep_list.csv", index=False)