{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T cell epitopes of SARS-CoV2\n",
    "\n",
    "## Methods\n",
    "\n",
    "* Predict MHC-I binders for sars-cov2 reference sequences (S and N important)\n",
    "* Align with sars-cov and get conserved epitopes.\n",
    "* Best alleles to use?\n",
    "* Multiple sequence alignment of each protein to reference\n",
    "* find conservation of binders with closest peptide in each HCov sequence and determine identity\n",
    "\n",
    "## References\n",
    "\n",
    "* J. Mateus et al., “Selective and cross-reactive SARS-CoV-2 T cell epitopes in unexposed humans,” Science (80-. )., vol. 3871, no. August, p. eabd3871, Aug. 2020.\n",
    "* S. F. Ahmed, A. A. Quadeer, and M. R. McKay, “Preliminary Identification of Potential Vaccine Targets for the COVID-19 Coronavirus (SARS-CoV-2) Based on SARS-CoV Immunological Studies.,” Viruses, vol. 12, no. 3, 2020.\n",
    "* A. Grifoni et al., “A sequence homology and bioinformatic approach can predict candidate targets for immune responses to SARS-CoV-2,” Cell Host Microbe, pp. 1–10, 2020.\n",
    "* V. Baruah and S. Bose, “Immunoinformatics-aided identification of T cell and B cell epitopes in the surface glycoprotein of 2019-nCoV,” J. Med. Virol., no. February, pp. 495–500, 2020.\n",
    "\n",
    "## Common coronoviruses\n",
    "\n",
    "* https://www.cdc.gov/coronavirus/types.html\n",
    "\n",
    "## How to use\n",
    "\n",
    "* You should install epitopepredict to use this notebook (https://github.com/dmnfarrell/epitopepredict)\n",
    "* Annotation is done here with pathogenie which you can install using:\n",
    "\n",
    "`pip install -e git+https://github.com/dmnfarrell/pathogenie.git#egg=pathogenie`\n",
    "\n",
    "* You will also need to install blast and clustal. On Ubuntu that's `apt install ncbi-blast+ clustalw`\n",
    "* It's also assumed you have netMHCIIpan installed. See https://epitopepredict.readthedocs.io/en/latest/description.html#installing-netmhcpan-and-netmhciipan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os, math, time, pickle, subprocess\n",
    "from importlib import reload\n",
    "from collections import OrderedDict, defaultdict\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "pd.set_option('display.width', 180)\n",
    "import epitopepredict as ep\n",
    "from epitopepredict import base, sequtils, plotting, peptutils, analysis, utilities\n",
    "from IPython.display import display, HTML, Image\n",
    "%matplotlib inline\n",
    "import matplotlib as mpl\n",
    "import pylab as plt\n",
    "import pathogenie\n",
    "from Bio import SeqIO,AlignIO\n",
    "from Bio.Seq import Seq\n",
    "from Bio.SeqRecord import SeqRecord"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## load ref genomes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "labels = {'sars':'NC_004718.3','scov2':'NC_045512.2','229E':'NC_002645.1','NL63':'NC_005831.2','OC43':'NC_006213.1','HKU1':'NC_006577.2'}\n",
    "genomes = []\n",
    "for l in labels:\n",
    "    df = ep.genbank_to_dataframe(labels[l]+'.gb',cds=True)\n",
    "    df['label'] = l\n",
    "    genomes.append(df)\n",
    "genomes = pd.concat(genomes)\n",
    "scov2_df = genomes[genomes.label=='scov2']\n",
    "scov2_df = scov2_df.drop_duplicates('locus_tag')\n",
    "#print (genomes[['label','gene','product','length']])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_seqs(gene):\n",
    "    seqs = []\n",
    "    sub = genomes[genomes['gene']==gene]\n",
    "    for i,r in sub.iterrows():\n",
    "        s=SeqRecord(Seq(r.translation),id=r.label)\n",
    "        seqs.append(s)\n",
    "    return seqs\n",
    "seqs=get_seqs('S')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## find orthologs in each genome\n",
    "### blast the genomes to find corresponding protein as names are ambigious"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Alignment with 6 rows and 1475 columns\n",
      "--MFIFLLFLT----------------LTSGSDLDRCTTFDDVQ...HYT sars\n",
      "--MFVFLVLLP----------------LVSSQCVN--LTTRTQL...HYT scov2\n",
      "-MFLILLISLPTAFAVIGD-------LKCTSDNINDKDTGPPPI...D-- OC43\n",
      "--MLLIIFILPTTLAVIGD-------FNCTNFAINDLNTTVPRI...D-- HKU1\n",
      "--------------------------------------------...HIQ 229E\n",
      "MKLFLILLVLPLASCFFTCNSNANLSMLQLGVPDNSSTIVTGLL...HVQ NL63\n"
     ]
    }
   ],
   "source": [
    "pathogenie.tools.dataframe_to_fasta(genomes, idkey='locus_tag', descrkey='product', outfile='proteins.fa')\n",
    "pathogenie.tools.make_blast_database('proteins.fa', dbtype='prot')\n",
    "\n",
    "def get_orthologs(gene):\n",
    "    sub = scov2_df[scov2_df['gene']==gene].iloc[0]    \n",
    "    rec = SeqRecord(Seq(sub.translation),id=sub.gene)\n",
    "    bl = pathogenie.tools.blast_sequences('proteins.fa', [rec], maxseqs=10, evalue=1e-4,\n",
    "                              cmd='blastp', threads=4)\n",
    "    bl = bl.drop_duplicates('sseqid')\n",
    "    #print (bl.sseqid)\n",
    "    found = genomes[genomes.locus_tag.isin(bl.sseqid)].drop_duplicates('locus_tag')\n",
    "    #print (found)\n",
    "    recs = pathogenie.tools.dataframe_to_seqrecords(found,\n",
    "                            seqkey='translation',idkey='label',desckey='product')\n",
    "    return recs\n",
    "\n",
    "seqs = get_orthologs('S')\n",
    "aln = pathogenie.clustal_alignment(seqs=seqs)\n",
    "print (aln)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Seq('MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYL...HYT')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "spikesars = SeqIO.to_dict(seqs)['sars'].seq\n",
    "spikesars"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "sc2 = ep.genbank_to_dataframe('NC_045512.2.gb',cds=True)\n",
    "sc2 = sc2.drop_duplicates('gene')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## predict MHC-I and MHC-II epitopes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "m1_alleles = ep.get_preset_alleles('broad_coverage_mhc1')\n",
    "m2_alleles = ep.get_preset_alleles('mhc2_supertypes')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "predictions done for 11 sequences in 26 alleles\n",
      "results saved to /home/damien/gitprojects/epitopepredict/examples/scov2_netmhcpan\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (3) have mixed types.Specify dtype option on import or set low_memory=False.\n",
      "  exec(code_obj, self.user_global_ns, self.user_ns)\n"
     ]
    }
   ],
   "source": [
    "P1 = base.get_predictor('netmhcpan') \n",
    "P1.predict_sequences(sc2, alleles=m1_alleles,threads=10,path='scov2_netmhcpan',length=9,overwrite=False)\n",
    "P1.load(path='scov2_netmhcpan')\n",
    "pb1 = P1.promiscuous_binders(n=3, cutoff=.95)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name                      allele           top peptide        score\n",
      "predictions done for 11 sequences in 8 alleles\n",
      "results saved to /home/damien/gitprojects/epitopepredict/examples/scov2_netmhciipan\n"
     ]
    }
   ],
   "source": [
    "P2 = base.get_predictor('netmhciipan') \n",
    "P2.predict_sequences(sc2, alleles=m2_alleles,threads=10,path='scov2_netmhciipan',length=15,overwrite=False,verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "predictions done for 11 sequences in 8 alleles\n",
      "results saved to /home/damien/gitprojects/epitopepredict/examples/scov2_tepitope\n"
     ]
    }
   ],
   "source": [
    "P3 = base.get_predictor('tepitope') \n",
    "P3.predict_sequences(sc2, alleles=m2_alleles,threads=10,path='scov2_tepitope',length=15,overwrite=False)\n",
    "P3.load(path='scov2_tepitope')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GU280_gp01    70\n",
       "GU280_gp02    70\n",
       "GU280_gp03    38\n",
       "GU280_gp05    30\n",
       "GU280_gp10    19\n",
       "GU280_gp07    14\n",
       "GU280_gp09    11\n",
       "GU280_gp04    11\n",
       "GU280_gp11    10\n",
       "GU280_gp06     9\n",
       "Name: name, dtype: int64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "P2.load(path='scov2_netmhciipan')\n",
    "pb2 = P2.promiscuous_binders(n=3, cutoff=.95, limit=70)\n",
    "rb2 = P2.promiscuous_binders(n=3, cutoff_method='rank', limit=50, cutoff=100)\n",
    "pb2.name.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = P2.get_binders(cutoff_method='rank', cutoff=100, limit=50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## conservation: find identity to closest peptide in each HCoV sequence "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6 70\n"
     ]
    }
   ],
   "source": [
    "import difflib\n",
    "\n",
    "def get_conservation(x, w):    \n",
    "    m = difflib.get_close_matches(x, w, n=1, cutoff=.67)\n",
    "    if len(m)==0:\n",
    "        return 0\n",
    "    else:\n",
    "        m=m[0]\n",
    "        s = difflib.SequenceMatcher(None, x, m)\n",
    "        return s.ratio()\n",
    "\n",
    "def find_epitopes_conserved(pb,gene,locus_tag):\n",
    "    \n",
    "    seqs = get_orthologs(gene)\n",
    "    df = pb[pb.name==locus_tag].copy()\n",
    "    #print (df)\n",
    "    print (len(seqs),len(df))\n",
    "    s=seqs[0]\n",
    "    for s in seqs:\n",
    "        if s.id == 'scov2': \n",
    "            continue\n",
    "        w,ss = peptutils.create_fragments(seq=str(s.seq), length=11)\n",
    "        df.loc[:,s.id] = df.peptide.apply(lambda x: get_conservation(x, w),1)\n",
    "\n",
    "    df.loc[:,'total'] = df[df.columns[8:]].sum(1)\n",
    "    df = df.sort_values('total',ascending=False)\n",
    "    df = df[df.total>0]\n",
    "    df = df.round(2)\n",
    "    return df\n",
    "\n",
    "df = find_epitopes_conserved(pb2, 'S','GU280_gp02')\n",
    "#df.to_csv('S_netmhciipan_conserved.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Find conserved predicted epitopes in all proteins"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GU280_gp01 ORF1ab\n",
      "6 70\n",
      "GU280_gp02 S\n",
      "6 70\n",
      "GU280_gp03 ORF3a\n",
      "2 38\n",
      "GU280_gp04 E\n",
      "3 11\n",
      "GU280_gp05 M\n",
      "6 30\n",
      "GU280_gp06 ORF6\n",
      "2 9\n",
      "GU280_gp07 ORF7a\n",
      "2 14\n",
      "GU280_gp08 ORF7b\n",
      "2 0\n",
      "GU280_gp09 ORF8\n",
      "2 11\n",
      "GU280_gp10 N\n",
      "6 19\n",
      "GU280_gp11 ORF10\n",
      "1 10\n",
      "162 282\n"
     ]
    }
   ],
   "source": [
    "res=[]\n",
    "for i,r in scov2_df.iterrows():\n",
    "    print (r.locus_tag,r.gene)    \n",
    "    df = find_epitopes_conserved(pb2,r.gene,r.locus_tag)\n",
    "    df['gene'] = r.gene\n",
    "    res.append(df)\n",
    "res = pd.concat(res).sort_values('total',ascending=False).dropna().reset_index()\n",
    "print (len(res),len(pb2))\n",
    "res.to_csv('scov2_netmhciipan_conserved.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "cols = ['gene','peptide','pos','alleles','sars','229E','NL63','OC43','HKU1']\n",
    "h=res[:30][cols].style.background_gradient(cmap=\"ocean_r\",subset=['sars','229E','NL63','OC43','HKU1']).set_precision(2)\n",
    "\n",
    "#res[:30][cols]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compare predictions to mateus exp results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sequence</th>\n",
       "      <th>Protein</th>\n",
       "      <th>Start</th>\n",
       "      <th>\"+\"/tested</th>\n",
       "      <th>SFC</th>\n",
       "      <th>CD4R-30</th>\n",
       "      <th>CD4S-31</th>\n",
       "      <th>hit</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>PSGTWLTYTGAIKLD</td>\n",
       "      <td>N</td>\n",
       "      <td>326</td>\n",
       "      <td>1/15</td>\n",
       "      <td>1067</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>[GTWLTYTGAIKLDDK]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>SFIEDLLFNKVTLAD</td>\n",
       "      <td>S</td>\n",
       "      <td>816</td>\n",
       "      <td>7/15</td>\n",
       "      <td>30487</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>[FIEDLLFNKVTLADA, DLLFNKVTLADAGFI]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>YEQYIKWPWYIWLGF</td>\n",
       "      <td>S</td>\n",
       "      <td>1206</td>\n",
       "      <td>1/17</td>\n",
       "      <td>200</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>VLKKLKKSLNVAKSE</td>\n",
       "      <td>nsp8</td>\n",
       "      <td>3976</td>\n",
       "      <td>1/16</td>\n",
       "      <td>5660</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>[VVLKKLKKSLNVAKS, EVVLKKLKKSLNVAK]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>KLLKSIAATRGATVV</td>\n",
       "      <td>nsp12</td>\n",
       "      <td>4966</td>\n",
       "      <td>1/17</td>\n",
       "      <td>187</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>[RQFHQKLLKSIAATR]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>EFYAYLRKHFSMMIL</td>\n",
       "      <td>nsp12</td>\n",
       "      <td>5136</td>\n",
       "      <td>2/18</td>\n",
       "      <td>787</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>[NEFYAYLRKHFSMMI, YLRKHFSMMILSDDA]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>LMIERFVSLAIDAYP</td>\n",
       "      <td>nsp12</td>\n",
       "      <td>5246</td>\n",
       "      <td>2/17</td>\n",
       "      <td>3870</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>TSHKLVLSVNPYVCN</td>\n",
       "      <td>nsp13</td>\n",
       "      <td>5361</td>\n",
       "      <td>1/17</td>\n",
       "      <td>160</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>NVNRFNVAITRAKVG</td>\n",
       "      <td>nsp13</td>\n",
       "      <td>5881</td>\n",
       "      <td>1/18</td>\n",
       "      <td>760</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>[VNRFNVAITRAKVGI]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          Sequence Protein  Start \"+\"/tested    SFC CD4R-30 CD4S-31                                 hit\n",
       "0  PSGTWLTYTGAIKLD       N    326       1/15   1067     Yes      No                   [GTWLTYTGAIKLDDK]\n",
       "1  SFIEDLLFNKVTLAD       S    816       7/15  30487      No     Yes  [FIEDLLFNKVTLADA, DLLFNKVTLADAGFI]\n",
       "2  YEQYIKWPWYIWLGF       S   1206       1/17    200      No     Yes                                None\n",
       "3  VLKKLKKSLNVAKSE    nsp8   3976       1/16   5660     Yes      No  [VVLKKLKKSLNVAKS, EVVLKKLKKSLNVAK]\n",
       "4  KLLKSIAATRGATVV   nsp12   4966       1/17    187     Yes      No                   [RQFHQKLLKSIAATR]\n",
       "5  EFYAYLRKHFSMMIL   nsp12   5136       2/18    787     Yes      No  [NEFYAYLRKHFSMMI, YLRKHFSMMILSDDA]\n",
       "6  LMIERFVSLAIDAYP   nsp12   5246       2/17   3870     Yes      No                                None\n",
       "7  TSHKLVLSVNPYVCN   nsp13   5361       1/17    160     Yes      No                                None\n",
       "8  NVNRFNVAITRAKVG   nsp13   5881       1/18    760     Yes      No                   [VNRFNVAITRAKVGI]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.6666666666666666\n"
     ]
    }
   ],
   "source": [
    "s1 = pd.read_csv('mateus_hcov_reactive.csv')\n",
    "hits=[]\n",
    "w = list(res.peptide)\n",
    "for i,r in s1.iterrows():    \n",
    "    m = difflib.get_close_matches(r.Sequence, w, n=2, cutoff=.6)\n",
    "    #print (r.Sequence,m,r.Protein)\n",
    "    if len(m)>0:\n",
    "        hits.append(m)\n",
    "    else:\n",
    "        hits.append(None)\n",
    "        \n",
    "s1['hit'] = hits\n",
    "display(s1)\n",
    "print (len(s1.hit.dropna())/len(s1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The epitope selection method\n",
    "\n",
    "Promiscuous binders are those high scoring above some threshold in multiple alleles. There are several ways to select them that can give different results. By default epitopepredict selects those in each allele above a percentile score cutoff and then counts how many alleles each peptide is present in. We can also limit our set in each protein across a genome to prevent large proteins dominating the list. We can also select by score and protein rank. The overlap is shown in the venn diagram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "predictions done for 4 sequences in 4 alleles\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "GU280_gp01    20\n",
       "GU280_gp02     9\n",
       "GU280_gp03     5\n",
       "GU280_gp04     3\n",
       "Name: name, dtype: int64"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reload(base)\n",
    "P = base.get_predictor('tepitope') \n",
    "P.predict_sequences(sc2, alleles=m2_alleles[:4],names=['GU280_gp01','GU280_gp02','GU280_gp03','GU280_gp04'],threads=10,length=9)\n",
    "pb= P.promiscuous_binders(n=2, cutoff=.98, limit=20)\n",
    "pb.name.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GU280_gp04    20\n",
       "GU280_gp03    19\n",
       "GU280_gp02     6\n",
       "GU280_gp01     4\n",
       "Name: name, dtype: int64"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rb= P.promiscuous_binders(n=3, cutoff_method='rank',cutoff=30,limit=20)\n",
    "rb.name.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GU280_gp01    20\n",
       "GU280_gp02    13\n",
       "GU280_gp04     6\n",
       "GU280_gp03     3\n",
       "Name: name, dtype: int64"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sb= P.promiscuous_binders(n=2, cutoff_method='score',cutoff=3.5,limit=20)\n",
    "sb.name.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAADqCAYAAABtE5PiAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAqZ0lEQVR4nO2deZTcVZn3P0/tVb2m00ln6XT2hASS0BAIS5BNlgwuOGPEQUZwXuTAvDqi48gsatMjeNQZZzyO8+I6zojigqKgIkdAdhJCIEBIQvalk3QnvXena6+67x+/6qSTdJJefmvV/ZxTp7urq+59flW/712e+9znilIKjUZTevicNkCj0TiDFr9GU6Jo8Ws0JYoWv0ZTomjxazQliha/RlOiaPFrNCWKFr9GU6Jo8Ws0JYoWv0ZTomjxazQliha/RlOiaPFrNCWKFr9GYyEi8qyI3G5COfeKyI/NsGkQLX6NpkTR4tdoToOIBJy2wSq0+DWaExCRPSJyj4i8BQyIyOdFZKeI9IvIZhH5wJDX3iYiL4rIv4lIt4jsFpFVpyh3qoi8JSJ/X/j7IhF5WUR6RORNEbliyGtni8hzhTqfBGrNvk4tfo1meP4SuAGoBrYClwFVQDPwYxGZOuS1KwqvqQW+BvxARGRoYSIyG3gO+JZS6l9FZDrwe+A+oAb4LPArEZlUeMtDwGuFMr8E3Gr2BWrxazTD802lVItSKqGUelgpdVAplVdK/RzYDlw45LV7lVLfU0rlgP8FpgJ1Q/6/GHgGaFJKfbfw3C3A40qpxwvlPgmsB/5MRBqAC4AvKKVSSqnngd+afYFa/BrN8LQM/iIiHxWRNwrD8x7gHI4fhrcN/qKUihd+LR/y/48AB4BfDnluJrB6sMxCuSsxGo5pQLdSamDI6/eO/5KOR4tfoxkeBSAiM4HvAZ8AJiqlqoG3ATn1W0/iXqADeEhE/IXnWoAHlVLVQx5lSqmvAK3ABBEpG1JGw7iuZhi0+DWa01OG0RC0A4jIxzB6/tGQAVYXyvqRiPiAHwPvFZHrRMQvIhERuUJE6pVSezGmAM0iEhKRlcB7zbqgQUxbxhCR/wH2K6U+f4bXLQR+DswF/lkp9c1x1HkvME8pdctYy9BYizRLAOOmL8MYCscwOp3BnPFDfw7+ngSOAP3AgGpSedsMPgGl1GYR+TqwBsgDPwJeGkM5aRH5c+B3wH8Dfw28H8NB+FMgB6wD7iq85WYM/0FXoe4fYTgfTUPMyts/CvH/AOhTSn3ahDrvpSB+EZkF7AaCSqnseMvWjBxpFsHwWE/CmAuXc0zs4XEWr4ABjMZgsEHoBg6rJtU3zrJLGicCGGYCP3OgXo1JFHrzOgzH1FQMwVt1LwlGI1J+0j+aJQEcxnC4HVBNqsMiG4qSMX9hItII/ACYDzzOsSEbIvIejPXLWcBm4E6l1Fsi8ifgcmCliHwDOK/w/vswpgG9wA+UUvcWyrkC+LFSqn5I2XuA25VST51g0vOFnz2FJdZrlFJrxnp9muORZolifEezMYTvBn9RFKMzmQkgzZIEDgL7gN2qSWUctM31jEn8IhICfgN8A/gWxtzlp8BXC43Cf2M4KNZjrGc+JiILlVJXicizGIL+fqGsacBHgU0YjpQnReQNpdRvRmnWuzCG/dV62G8OhR5+FkYDPR13CP50RIA5hcdKaZZdGGvyB1WTPpfuRMba818EBIFvKMNp8EsR+Uzhf3cA31FKvVL4+39F5J8K73nuxIKUUs8O+fMtEfkpxujgN2O0TTMOCvP3egzBz8KZqaEZBIAFhccRaZbtwDbVpHqdNcs9jPWLnQYcUMd7CweDEGYCt4rIJ4f8L1R4z0mIyArgKxi9fgjDQfTwGO3SjJFCL38WsAzDWVdMlAONQKM0y2Fgo2pSOx22yXHGKv5WYLqIyJAGoAHYiRG8cL9S6v4RlvUQxtRhlVIqWfAFDEZPDWAsDQFQCJCYdFIJBnpYNwYKol8MLGXIZ13ETAaulmZpBF5TTWq30wY5xVjncGuALPC3IhIsrF8Oxjp/D7hTRFaIQZmI3CAiFacoqwLoKgj/Qoz1zUG2AZHC+4PA5zn10lE7xjrsnDFeU0khzRKQZlmKsYHlIkpD+EOpAa6RZvlzaZaZThvjBGPq+YcELHwPw1P/OPBI4X/rReTjGL35fCABvMgxb/yJ/A3wdRH5FoZP4BcUghmUUr0i8jfA9wE/RkDE/lPYFBeR+4GXCg3F9UqptWO5vlFjRGwFC49A4SEYkV3Zoz+NjR+OIs3ix5hiLcNwkJU6tcB10iztGCOBfU4bZBemBfkUPcbIZSJQiTFaqSj8Xs7IG9EcRqBKH0awSl/h0YlS/WabfCLSLFMxVkWqrK7LwxwEni+FACIt/uEwevLJwJTCYzLW95JxjICVQ4VHu1kjBWmWIMae88VmlFcCZIF1qkm97bQhVqLFP4iRrqkBY3mrAWPlwUkyGM7TPcA+lEqPpRBplhkYiShOipDTnJFW4LliHQWUtviNHn4ORuTadNy7pp3H2A++E9g5khGBNEsEuASYZ7FtxU7RjgJKU/wi5cAijHXtqMPWjJYURsqozajheyRplunAVXjv2txMG/BsMY0CSkv8Rt61JRiBSKNJxuBWWoCNKHV0BUSaZRnGsmsxXJ/bSAFPqSZ1wGlDzKA0xG8kRbwQY2hfjBw+VMn6KZ9hIcYURmMdClirmtRGpw0ZL8UtfpEJGIkQZzlsiaUko2Rfvprcm7WoB+qQ7dFx76HXnJmtwAtOJhoZL8UpfmPX4YUY8/qiHv72VpNeexW+TPiYs3JdGfEH6gh1BF3rwCwW9gNPenXrcPGJ30i4uJLi25xyEt0TSa29ikAugP/E/yWF3LfrSD1dVXJhu3bTATyhmo5m7fUMxSN+kShwKSUS2985ieQrVxDMDyP8oawvI/6vUwnH/ad/nWZc9AGPea0BKA7xi8wArqREYtXb60i+ejmhvH9kG7P6fWS/PpXsa+Wl8fk4RDdGA5By2pCR4m3xG/m6zsdIB1YStE8hue5dhNQIhT+UpyqJ/786Ihmf6zPyeJXDwO+94gPwrvhFIhiBLPVnemmx0FtN+qVr8J9pqH862oKk7mnA3xXQzkCLOIDhA3B8B+eZ8Kb4RWqA6ymhePVElOzzq2CoV3+s9PrJ/OMMVEvY8f0LxcoejFUAV4vLe8M/kTrgfZSQ8DNBci+/m7wZwgeoyhH8t734F8XxzPzUY8zCyEPparwlfsOxdwPO77izjbyg1l5JJlFu7jXHFP77Wwhe3E/CzHI1R1kgzXK+00acDu+IX2QucB3u3XlnCW8vJ9E70RovfRB89xwkckM3nlqi8hDnSbMMm7jWDXhD/CLzMJx73rDXJFrrSeybZ22Qjh/kzsPEVnfqBsACBLiqsL3adbhfTCL1wBUUeZjuicRjZN642L7pzS0dRC/r01MAC4hh3L+uw93iN3bjXYPb7TSZvKDWXUFuuLBdq/CBfLqV8MKEdgJaQEMhU7KrcK+oRKqAVRgZcUuKzY3Ej1TZH40XBF/zfvx1aTwRpOIxLpRmOdWZE47gTvGLhDGE78q5kpX0TCC1Z75zm3HK8gS+0oKK5XB9kIrH8GEcFuKalSp3it+I06902gi7UaBevxTwOevfqM0S+nILGb/SpyCZTCVGOLorcJ/4jVN+G5w2wwm2n008XuGORBxzU0Q+1aYdgBZwtjSLK85NcJf4RaYAy502wwkSUbI7znZXws0r+4it0EFAZuPDOB7NcdwjfiP7zlWU2JLeIG8vJz3SLbp2cncbwcqsnv+bzMxChmVHcdPNtoISitcfSl8V6UPT3dXrD1KeJ/DZVr38ZwEXS7M42tG5Q/zGZp1FTpvhFG+fTw6xdsSTziG3PMLn7/gtnxjtexvjxC7Vw3+zqcHhe9558Run5lzmtBlO0VVLqqvO+l7/39dwdXWE1rG+//+2EYzk8WymWpey3MmlP+fFD0sxWsGS5O3zrV9O29RO9fYullwxixfHWkZFnsCdh0iaaZeGCMYhMo7grPhFYowhBdf5cGsM/q0Wmgafuw7eNwm+OBm+MAc+tcYDx1B31ZLqq7E+kOm/1nHTh8/mVz4ZX0NzRR/RSRkd/WcyZ0uzOJJc1emev5ExbNH9CLz8X/DNoc99G/7YDv9yGL60AjbeDe8xy0ir2H629V70B99iSSxI/zVz2Tfesvwgf91O1gy7NEeJAPOdqNg58RuHZZ41lrd+BrbPgIGhz83m2JA0DiHB3dFpySjZ9inWz/W3djBvby/LPvQwX35oIx8/NMDCT/6Bvx5reRf3E9G9v+k4sunHycQYjWDurrVr4Ma1cFEYEk/D180s22x2LCKFz/qDRe67il8DvwZ4ZAsLntjBtf+5iv8ea3mDvf9Xp5XehisLqZZmmW73AaDO9PxGr7/Q7GKfhN/0wz9cCq/cY+wPcCU5H/mWOd7dtKR7f0uwfdnPqWH/Uivr/jSsW+/iXP6tDSRzQftP0PnzRWz77nv51njL0XN/S5glzWJroJf94hcJAAvMLvZ3MHnw9+/CssnQZnYdZtEyx/shzBf3E6nN6AbARHxYMBo+HU7M+ecwzuy7S+D2XbAgCeXl8NWb4LGXYcltUCegJkDnD+En5phrLpkgua5J7ti5Nx78INf1kvpJbWklVLWY2cAbdlVm/6EdIjcypJcuNfbOI7HxAnfG8Y+WtiCpj8/xfkPmMh5UTcqWUGp7h/3GSTslK3yA/bOctsA8pmQIz0yRdtqOImOGXRXZPWQzfa7vJTJBct0W5eB3iut7yHynrggOUdnDBH7Nx0gXMkjN4XlW8yfepJ4nuYU8QYQcV/IQy9ljoSUNwDYLyz+K3eKfaXN9rqJ9Khl8x4s/r3wqninPDmSqcgPpqvxAuop4poJkrsyXVwLK58sjCkSUEhSCUkLAl1HhQCIfCQyoaGBARQIDEg0ckUhwwFcW7AuE/ClbVhNW9hP6Tp0dNVlMgDxX8Esa2UcPYb7N59nEFp7hLzif33Ilm3iGc3iBv2C5pTEk9dIsPtWkLN9EZZ/4RarxQLy9lRyoC2YPDUxJdsan5boSU3wDmcpAOhcJgAQZZZbiVA4GMtWn/H/In8hUhLqy1ZH2fG3soL8m2hYK+LKmT/OqcwQXx0ltjnl87l9PL/X0AlBNijJa6aQagGTBR5MgSpgeiy0JAVOAgxbXY2vPX3K9vkJUB+em2rgo18GywKPJTcHEgZQtIknnosHOxPRgZ2I6O7vPBfKqPNiXrom1ZuortgcmxtpMs+P6XnKbHcs3bAG7mEg/M1jCbmr5Ob/hU2zggyiEm/mqDRY0oMXvPRSi2jkv1cLV+XbOC2UpiwCk/elcImTPUHx4fHIkUx060lsd2te7iJA/kZlStjfdUPVOYEL08LgagnPiRRTq20uYX3Eny/kFE0jyCJeznF9wLRt4gvN5lFu5m/+w2IoGYK3FddgkfiM/XzHMDE9JPzPSO1idbWNFOEfsJKdeZ0VnGtyzxJfORYP7+s4K7us7i7A/nplSvjvdULU1UB3pGHVDMClLMJYjF/fbH7VoKin8/JA7mckrXMsGAA5yMR/j5wBcy2us46M2WFItzRJQTcrSICq7ev5JFGFiToWog6xM7uID0sv8CKcJXmqvaHdtFpxULhbc23t2cG/v2VSGO5JnTVyv6sr3jaqhOjtB+tVy9zRuoyYP/JCPUkkrH+Kpo8+H6GUNC7iUbbzEWUQ5bJNFNWBtXXaKv2jIEczv4sbkbt4fTFM1ohu+q6LLE71iX6o2su7g9cSCvakFE1/P1VfsiIqoMzbc58TJv+rl9KvrmEcbFxHjAF/jCwBcwK+5nAd5gZt4CR8+MlzNgzZZNAGLxW9PhJ/ItcAs6yuyljx+tZc/S2znpmCaqlHNcx8/9/Fczp/zRAMwlLB/ID2v5s3srOrNUZ/kT9kIbImQ/NzM4ophcJi3VJOydN5vV8/v+ai+fVwT38pfBVNMGLVfO+vL5r0ofIBUriy0qf2S0PauxszSyS9kp1bsGXakMztVRE4/d2B5XkvrxW/k6fPsQlA/M9Ib+Gy+jzljvoZkMJnD+ZRp4yKdiwbXt14bnNhzINE45dlgNDhw3L0TUfinp0gfCBdBtJ87sFz8dtyQngzsyRHMv83HB57jP4N9zBnXcDYejhfNiTedienRP+25ybe187y4UnLcnPGspN7iayIxq9N62zHs95wbqIMlyQ38nS/FRFPSbCVCCdd6+sdCXgV82zqXx/b3LUg1TnmGmuihMMCUjLvzJnqQGizMS2FHz19hQx2moBC1mdvia7kvnGKiaa1uPBQ3qyhXEc9Uhl9qeV9oc/uKAaVETcoU33Kuw1g6XdY9f4EU1dl1fDHby3zTP/B4uDjFbyCys3tZWWdiaqI8/EcFxXyttmOpE9WOnt/14m+nMfkM36YQqGM6yWCy6HvEnuTk6PYdt/iIL9KHepqH58Xv6rXf3dwQf4WmcJYyy0ZByWDS057+keLPRgK0fDlI9w26+zcHSx1+dtyUrs3x9jYfH9jEnTHwW9ozp4Ip134GZhLKZwUCPg7fGePw/xk48zs0Z8Dz3n7XBbfk8avX+IfEIS6y/NAMADWC8NhiIJDPHvuuu28sI1cRZ+o3PBvj4QI8P+x3Va+Xx6/W8OXkIS6y7aaUE9bDixUfSCCfPbas2Xd1jP1NcZS/JK7fAjwvftf0/Hn8ai33J7tZbO/hCMpXMjd/XnzHj3IGlsfY9+WkbgDGhOfn/K5wduXxq1doTnZxtu3bTktF/Dkkf5L4AZKLo+z/YoISGQGZiKXasUOYjoe2KkS9yhcTnSxzZL+5L++K9s9ysr7AqSMZ4+fFaPu0LfnoiwhLz0O04650PN77DT6daOc8xxxPpdLzZ3xnGNr3XRnj8O16GXDkeF78jp7mup3V8QNc6ajH2ad8RRXbfyqy4j/zdXa/P0bX+3UDMDIs7TjtEL9jEV+HuCCxlVscTy1VKsP+jC8wshFO+8eiOhJwRHi+50/aUMdJDDA18zqfC8EwDiibKZVhf3qk4scvHPiCn2y141NCl2Opj8QO8dse6WV49u/N54i4YpmxVHr+tD848kYuXxFg/71ZvQJwWiztOO24K3ttqOM4NnFHPM4015wgE81ES+IGT/tGIX6A1NwIh+/U8/9T4/me31bxd7AkuZfrXRVSWhWvcnzqYQfdofLRX2fP9TE9/z8lljaMdoTe9tlQBwAZYrnXucfvhnn+UKwW/yNbHrm1O9G9JOAL9N967q3NAN2J7tgTO564I5VLTQz7w52r5q/6bnWk2tKbqS06cQzTLJ/Q+jlh9h15fJnSmB+NDAXWngtox4fdB/akd3qTv02NNqW2HVQmKi21aV7NvJdXNqz85tDnXtz34qqaaM07t5172xdqojXvvLjvxeuttCEPqiNSNbZw1GxtiPbbdQDQ8fSoJmVpgJz14lcqjw1D/y4Wpdq41FXD/UGCuaA/lAlZtmyztG7p9vJQ+XGO1c5E57LGqY1rABqnNq7piHeca1X9AH3B8sywob0jpef6GImFevh/jC6rK7BrmHXIysIVot7kU1ZWMW7KUmW2Lmtl89nKyWWTewEmxSb1ZvPZSivra49UjfP6fEKbu79Dm+m0ugK7xG9ZBlKAPbwnMcB013j3h6MyXulYlJ+IgMVTr7aoCWnm0zPC9F6tvf8GRdPzt1pVcIZY7h1ucbXwAaoSVbY6swK+QN/hgcNVAIcHDlcFfIF+K+tri9aY4zxuvy2ot/8CRdPzK9WHRcsWW/lIMkfMFcE8p6Mqbq/4J0YnvrmhdcPFABtaN1w8MTrxTavqyoov3xOqMMepmasO0vmhUu/9k6pJWR4cZ89BnQAi7wbmmFlkhljuSX5EnrDrxZ+TXP7xxscFMT+3/cObHr69P92/IJvPlgd8gf65E+Y+tqRuyRtP7HjijnQuXRPyh7pWzVv1nQnRCZaIqiNcmXxk5uXmJWqVZI65t4I/7vrv1SIOqCb1e6srsTPF1h5MFv92bkrmCduSh2+8+JXfV5GsSPZH+03PZrz67NXfH+75m5fc/B9m1zUcu8qnmevPUBE/3R8YoPYnnvhuLaDFjkrsHIruw8TEHlki+b1c7+q04CdS11PneGITs8mDeqeqwXyfS/cN4RKe+++xoxL7xK9UGthvVnE7+UDCC3P9oUzvmu66AKTxcihSk0oGLJh25SsC9L67FAN/elSTsiUq1u5wyl1mFbSP6zwnpMpkZcjKYB8neKe6wbolzK4PeqpxN4m9dlVkt/j3YsLQv53GpJkHadrJ5L7JRSP+rPhyu8qnWTf1ykwJM9DoSD4IBylS8RtD/53jLWYX7/NsWqwZHTOKZvPK3rK6dM7nt/Z6elZ59rseA0ksjoYdihM34tvjeXOa8lwH53rK0TeU2iO1kWIZ+m+unmX97smB88Lkg6XSAOxTTXatvTshfqU6GEfr1sI1KUXA073n1J6pnhd/3B/OtMZqrW+EVdhP/8pSGfrvtrMyp47S2gjUjeWNray0vLe5m7tvbaFlSZhw/0M81AzQQkvsXu69Y4CBiWWUdTbT/N166scUNNPQ0eDfO8m2qZ0lbK2akcbi46SO0nuNUPWMLVU5SD/GcrhtONWD7gGOjPZNGWK5HuZa3ttcwRUv38Vdx+2Pf4AHVs1m9js/42dfmM3sdx7ggTHvj6+OV4ejqWh6/JY6Q1r8uTdq5tuXFTmxOEIuVnQxEiewyc4hPzglfmOP/4bRvu0QK1JWH6cNcCM3bq+l9rjY6l3sWraa1WsAVrN6zU52njueOha0LvBs5tqNE+YmMz47p15+YeACzzaWIyADvGN3pU7OnbcyyjRFB7nMsfRcKVKVC1nYCzCf+b0pUuPaHz+jc0Y0nA577oZO+YLZN2vm2n8WQv/FxRztt001KdvvBefEb/T+r47mLZ2c7Yq1fZ/xsY3rZhREvNj7b6iZl87a2usXSJzjuaCuUbDRiUqd9ZortRs4PJKX9jMj7WQ4b5hw31a2VgFsZWtVmPC498c3dDZEvbTs1x+IpjdOcKDXB8hVBUnXeeazGgX77ArnPRE3LJmtHcmLOljqaC85m9lvPszDFwM8zMMXz2HOuPfH+5RP5rfN98wN/WLdkrwqpAVyhIHlnvmsRoEjvT64QfxKtQFbzvSyLs6x7ab7BJ+4/T7uu+cIR+pWs/qr3+Sbl97FXU/sZveiD/PhL+1m96K7uOsPZtQ1q32WJ3r/g9GJiZayOmeDq5ILHa3eAg6pJnXAqcrtS+ZxWiskCKwGyk/1kqf5fjpBnSvm/Gazo25HfEv9FldmHgYjU88vZ16e6wuVOzvvDu1JMvuTno3uHIZHVZOyLZz3RJwK8jkepTKIPA/82XD/zhLOJ5hUtA6f2YdnR3ZM2ZHNBDLu+D5O4MXJS5J9ofLRNU7719ax7lt3HP07M1BL/UWPcek9T4/ZkMzUYroHdjkpfHCL+AGU2o/IVuCksd0A07LgK8peH4wsP/Pa5iW31G9xz/dRYEfFtPi2qobRj0rqLzpE/UVfAiCXFn5189eYt2rUsR3HocJ+0nUZQoe83gjkgFecNsL5Of/xrGGYAz6OMKPYo7uYe2hutCJR4aoY9t5gLPVcnQmbqLb8ehGh8nbqlo4/HXXSe8ujw/CmalKWZlMeCe4Sv7Hl948YEU9HOcKMot/VJYgs37ncL3lxxbVmxZf7w/QVPlO27O5fcwGTz1lnglmQNjlfoP30MYboVitwl/gBlOoGnhv61BGmu+rgTasoT5UHzzp4lit6/+frlqVNcfClB/z07lvKWTe+ZoJZkJ3k9XvhJavP4Bsp7hM/gFK7gDcG/0xQ5/UvfMTMOzQvVjVQ5WgDsK2yPr6jst6cYJ7NvzyHaM0+auaZM8zNTjSlGIfYoZqULZl5R4I7xW/wKoUtjimqSiqX2/Jdy/2+vM+R3qE7VJ56vm6ZeVF8B9ddyJRlowrjPi3ZCW6+Z09HD/CC00YMxb0fpBGA8BTQliXmXjstIJaOBRftX2T7Ro/uUHnq0RkrA+M6bXco8c4Q/a2LWPRB8+a4OU92BFngKdWkXBXM5W5RKZUFnjjIRM/tfhsvc9rnRCccmWBb6uquUHny0RkrA2l/0DxxxSamuemRz1Ax1bzryHmyI3hZNSnLD94cLe7/IJVK383l4UN4N/nFWFm+a3nQn/NbPvzvClUkH224LGSq8C3DAyYez3bVpGzfqz8SXC9+EQJ9hP2fZaXvILGU0/bYSSQTCVyw84KMKOuW/zrDlcnfNKwM2ZucYxwok6Yk9tCDy+b5Q/HCF+4H6CEc+BTvCm6juqROcZnUPynSuLsxhRpf/oDh6AhXJh+dcWnIkf35Y8Yz4s8CT6om5dqgJC986UeHvUkCvs9xSWQ9k0rqCOfp3dOji/cvNrXROxypTjzmOeGDHWncTCCPIfxupw05HV744nPH/+GTZlbEHmem5eeXu4m5h+fG5rbNNeWat1XWx39bf0nYe8IvoMQFW1FPiQKedtN6/qlw/ZevjOHuSXPeB1hS9jUaEymcWQ93gsUHFpdN65o25lFPWvy5J6een3h2SmPM8pN2LCOTR5Sbe//nVJOyNf/+WPHKDTCswF9gevSTXJ4vJUfgebvPi9b21Y66AegMVyYfnnWl2l0xzZk0XGbhS7k5tv9F1aS2OW3ESPG0+AFaKQt+gsuDzzC9JPwAgsiFOy6MVMYrRxQCrEC9OWHuwCMN7woPBKOu2zI8anxJt4p/rWpSm502YjR4Rfyn7dkz+H3/TmPsn7go2UGk6OMB/Mrvu2TbJcEzbQFO+EOZ39VfnHpl0uIyR3PvmYkv4cb5/uuqSb3ltBGjxSviH9HpPhupjdzBlYHHmDWQG2dqbbcTzAX9l225LDSp9+SVDwVqT1ld/BezrvTZcp6enfgSbur5FcYuvfVOGzIW3JHD7wyI8C7grNG8ZyZ96b9hY24x3d6e446At2a8NbB38t4ygPZwVfKFuqXSEakOO22XJVS8EGfa19yQ7zCNEa+/32lDxopX5oCjPtdvL5Whe7iUJXQk7+RtGjhSXD3gEJa2LC07kp7U97PlBHZXTnWDMKwj5Aqt9QJPqCZ1UtYpL+GVnn8BcMV4yriUg4mP8o5vGvGi6hEPE0k/xMLs08yIUdeeZOVrAUJZrzTqo2fq1xJUvuDkaG4/Ro/ved+SV8Q/FXivGWUtoSO5mh35pXRE/eBJJ1ge1BZqkg8zT15j8vEjmmgiy1Vrc1QOFFUjd5SZf5smstupZK6bgDWqSbnJ7zBmvCL+CPBRM8ucRDzzQXamL+NguAJ3psw+kTj+3PNMT/6SuaFDlJ06xZbkFY2b48zfE8PnzQbulMz/YB5fym5HdRpjW65n1vBHgifEDyDCXwIVVpS9hI7k1ezPr6AtVI67hswDBLLrmZx+mhm+N6gNK0axZFfTk2LleqEsWRxpz/2daebdZve17MeI2iu6cHIvif9qYK6ldaDUOXSmLuRQbimd/gb6QwGUrb1MDtQBylNvMzG3lin+UQv+RHy5POdtTjBvbwzx+CigfG2c6ffb5dBMA+u8FrgzGrwk/qXARXbWGSSXP4eu9Lm05+bS62vgSKCKVMBnkq8gB6qbSKaVWG4nVfnXmezfzIRQCgs23NR2pbj0dSHm4VHApO/HqXnUDvHvwhjmF3XUqJfEPwV4n9N2BMnl6zmSredIroaUmkCSatJUkaKSzNFGQTA2nyhE9RFUvYTpJaR6CEsHEdlDpf8gZcGcnfvTJa84a1eCxTtCnlwRsN7Z14ch+n0W1uEavCT+AHAb3olKdC/+bJ4l2xIs2BPFn/fG5ynpPPM/KBbt6Bs8SGN7sXjyR4JnWn+lyIrQCkx32hbPkwv4eGNxGVvm5jh3ywCz9rt/VSC8I4Uos9f3+4DXMfLpl4zoB/GM+AvsRYvfPFJhP6+cW8bGBRnO2Z5h5sEwgZw7M2RWvmjmELWXY6L3xtDXAjwz7AcQoRy42Wk7ihZ/Ns+cliQL9vjdFSSUV8y9NUegZzydlQJagXeAnaUs+kE81fMrxRERuoAap20pSnIBH9tnx9g+GyZ2p1i0M8e0QxH89i53nkRof4pAz1j3ZnQB2zF6+aJbqx8PnhJ/gT1o8VtP54QwLy6HYCbHjNY4Mw/CpK6IIw7C8rWjTdV2BNiBIXjXHZbhFjw17AcQYQKw2mk7ShJfLs/U9jT1bXmmdARsixmYfUeGUOvpTgxOA4eANqBVNak2W+zyOJ7r+ZWiW4Q2YIrTtpQceb+PA1MiHCh89NFEltruDLXdeSb0+qg6EiCSHv+x3kMJ70wSaj1xyN/PMbG3Ad16Dj96PCf+AlvQ4neeRDRASzRAy7Rjz4VTOWp6M9T05IglIZISoikhnPYRSvsJZP1nXFbM+nOkgnlSoRzTtxzA8M73YYi+t9gj7+zCc8N+ABH8wC2AizzSmhETShtzeCWgRFBDfxc4tpchCfxEqVMncNWMHU/2/EqRE2EbsMRpWzRjIB0aaSzBVi186/BGaOfwbKHIk3SWOHmM5Bkai/Cs+JWiB9jptB0ay9im1OhzN2pGjmfFX+B1dO9fjOQxvluNhXha/IXef4fTdmhM5x3d61uPp8Vf4DWGOchT41lyGNtrNRbjefErRR9QVIkVS5wtSqFj8G3A8+IvsI4znOen8QRxwJNHX3mRohC/UiSBtU7boRk3LyuF5w/D8ApFIX4ApdiKsV9b401alGKX00aUEkUj/gIvoJ1/XiQLvOi0EaVGUYm/sPSnPcXeY71S9DttRKlRVOIvsAFju6fGG7QoxVtOG1GKFJ34lSIPPI32/nuBAeAZp40oVTy5pXckiDATuM5pO47n/FthyxKI9UNHs/HcqvfCCyshVohou+vX0Py2czbaRh74nVLorDsO4cktvSNBKfaKsAFodNqWY3zkZZjwDPz9x45//rqn4FdPOmOTY6zXwneWohv2n8B6jFz/LuEz22GGjl6DnUrxhtNGlDpFLX6lUBjzf5c7AJ+8EiZ90ZgWbLXrFFqnOICe57uCohY/GMd8AU8A3U7bMjz3PQuH/xkOfglqeuHmYs5M3AH8seCU1ThM0YsfQClSwOPgxm2i5/dDREFQwT0vwN5ZTltkEX3AH5Qi47QhGoOSED9AYafY4xhJIV3EK1XHfv/uuTD1oGOmWMcA8LhSJJw2RHOMol3qOxUiVAE3AOX2177kdti1AJLlEO2Hmx6D1xbC/noQYEInPPhjWNFrv22W0Qv8XifncB8lJ34AEcowGoBqh00pdjrRPb5rKUnxA4gQAVYBk5y2pUg5hDHH11t0XUrJih9AhCDwbmCG07YUGXuAPxVWWjQupaTFDyCCAOcD5zltSxGggFd1AI83KHnxDyJCA3Al+giwsZIEnlKKYlytKEq0+IcgQgVwDVDrtC0e4zDwpE686S20+E+gcAjoecAySigOYowMptl+Q0fteQ8t/lMgQg1wOXo14FQcAp4rZE/SeBAt/tNQcAYuAZZTxNufR0kGeBXYVNg4pfEoWvwjoOALuACYixGKV4ooYDvGPnwdrVcEaPGPgsJU4AJgptO22EwLsE4pOp02RGMeWvxjQITJwIXANKdtsZgDGD29y/MhaMaCFv84EKEWOBtjOlAsPoEsxvB+k1J0OW2Mxjq0+E1AhDCwEFgMVDpszljpBTYB23Q8fmmgxW8yIkwDZgOzgDJnrTkjRzDi8HfpZJqlhxa/hRR8A7MKj2onbRlCD4bgdytFu7OmaJxEi98mCjkEpgJTgMlADdZHEGYx8ua1YYTgHtJ76zWDaPE7RCGMeAJQBVRg+AoqCo9yRt4w5DDOte8rPPoLP3uBbh12qzkVWvwuRQQfxgpCsPAzgBFglMHo0TNARkfZacaKFr9GU6LoXWsaTYmixa/RlCha/BpNiaLFr9GUKFr8Gk2JosWv0ZQoWvwaTYmixV9EiIH+TjUjQt8oDiAi94jIARHpF5GtInK1iPhF5J9EZGfh+ddEZEbh9ZeIyKsi0lv4ecmQsp4VkftF5CWMMN85InKWiDwpIl2F8j/k1LVqXIxSSj9sfGDs+28BphX+noWRDOTvgY2F/wtG6vCJGBuAuoG/wgjx/cvC3xML738W2IeRVCSAsVegBfhY4e9GjM09i52+dv1w10P3/PaTwzgVaLGIBJVSe5RSO4Hbgc8rpbYqgzeVUp0YpwlvV0o9qJTKKqV+CrwDvHdImf+jlNqklMoC1wN7lFI/LLx+A/ArYLWtV6lxPVr8NqOU2gHcDdwLHBaRn4nINIzDQncO85ZpwN4TntsLTB/yd8uQ32cCK0SkZ/ABfARjK7FGcxQtfgdQSj2klFqJIVQFfBVDwHOHeflBTs4W3ICRXPNokUN+bwGeU0pVD3mUK6XuMu8KNMWAFr/NiMhCEblKRMIYh1smgDzwfeBLIjK/4LVfKiITgceBBSJys4gEROQmjFyBvztFFb8rvP6vRCRYeFwgIotsuDyNh9Dit58w8BWOZdiZDPwj8O/AL4A/YiTj+AEQLcz73wP8HdAJfA54j1KqY7jClVL9wLXAhzFGDW0YIwt9+rDmOPR+fo2mRNE9v0ZTomjxazQliha/RlOiaPFrNCWKFr9GU6Jo8Ws0JYoWv0ZTomjxazQliha/RlOi/H9cSJJSNotBAwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from matplotlib_venn import venn3\n",
    "ax = venn3((set(pb.peptide),set(rb.peptide),set(sb.peptide)), set_labels = ('default', 'ranked', 'score'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "b=P.get_binders(cutoff=10, cutoff_method='rank')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 194,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GU280_gp01    39\n",
       "GU280_gp02    35\n",
       "GU280_gp03    29\n",
       "GU280_gp04    20\n",
       "Name: name, dtype: int64"
      ]
     },
     "execution_count": 194,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "func = max\n",
    "s=b.groupby(['peptide','pos','name']).agg({'allele': pd.Series.count,\n",
    "             'core': base.first, P.scorekey:[func,np.mean],\n",
    "             'rank': np.median})\n",
    "s.columns = s.columns.get_level_values(1)\n",
    "s.rename(columns={'max': P.scorekey, 'count': 'alleles','median':'median_rank',\n",
    "                 'first':'core'}, inplace=True)\n",
    "s = s.reset_index()\n",
    "s\n",
    "s.name.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 197,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GU280_gp03    10\n",
       "GU280_gp02    10\n",
       "GU280_gp01    10\n",
       "GU280_gp04    10\n",
       "Name: name, dtype: int64"
      ]
     },
     "execution_count": 197,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s=s.groupby('name').head(10)\n",
    "s.name.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}