Technical documentation
The Scoutender methodology
This page details the technical pipeline that separates noise from opportunities genuinely relevant to your firm. Our goal: zero dependency on paid LLMs, full transparency on classifier decisions, openness on geographical expansion.
Why a dedicated Africa classifier?
Global platforms (DevelopmentAid, Devex…) cover international donors widely, but their geographic and sectoral filters remain coarse: too broad for Africa-specialized firms, too English-only for Francophone structures, too expensive ($200–500/user/month) for pilot use. Scoutender was designed to fill this gap: an exhaustive bilingual FR + EN classifier (~700 keywords), a sectoral taxonomy aligned with CPCS-core expertise (energy, transport, PPP, capacity building), strict geographic filtering on the 54 African Union member states + Haiti.
The three pipeline stages
Stage 1 — Extraction. Depending on the source, we use an official API (TED Europa eForms, World Bank procurement, OCDS Rwanda Umucyo) or HTML/JS-aware scraping (httpx + BeautifulSoup for server-rendered portals like national ARMP, BADEA, KfW, MCC, OPEC Fund, IsDB, LuxDev ; Playwright headless Chromium for JS-heavy portals like GCF Oracle, UNDP, ARMP Burundi). Stage 2 — Classification. Each item passes through a deterministic classifier (~700 lines Python): detection of procurement keywords, sector (energy / transport / PPP / capacity), target vs non-target country, AO/AMI/RFP type. Multi-factor scoring with bonuses for strong donors and target country mentions. Stage 3 — Enrichment. For kept opportunities, we fetch the detail page (HTML + PDF Terms of Reference) and extract via regex heuristic: description, components, qualifications, required experts, exact dates.
Sectoral scoring: what we look at
Scoring combines 5 dimensions: presence of a procurement keyword (5 points if in title), energy/transport sector match (3 points), CPCS-core PPP/capacity match (3 points), target country detected (3 points), strong donor in STRONG_DONORS (2 points). Keep threshold: 5 points. Every keyword list is strictly bilingual FR + EN: "appel d'offres" matches as well as "tender", "ingénieur conseil" as much as "consulting engineer". "Menu title" patterns ("tenders in...", "view all tenders", "home/accueil") are systematically rejected.
Geographical coverage
Scope: continental coverage of the 54 African Union member states + Haiti. Countries grouped by economic union (ECOWAS, COMESA, SADC, extended North Africa) and geographic zone (Central Africa CEMAC+DRC, Horn of Africa, Indian Ocean, Caribbean) to make the country selector easy to pre-fill. More than 100 out-of-zone countries/regions are blacklisted (NON_TARGET_COUNTRIES) to automatically reject out-of-scope opportunities, capitals and sub-regions included (e.g. "Issyk-Kul Region" → Kyrgyzstan → reject).
Technical sources
Today: 40+ donors monitored through 40+ technical sources. Official APIs are prioritized (10x reliability vs scraping): TED Europa eForms v3, World Bank procurement search, OCDS Rwanda Umucyo. HTML scraping for direct portals: AfDB procurement notices, BADEA, KfW Development Bank, MCC, OPEC Fund, IsDB project-procurement, LuxDev, BOAD, BDEAC, EBRD, DBSA, NDB, PIDG, ENEO Cameroon, marchés-publics.gouv.fr (AFD/Proparco via PLACE), national ARMP (Burundi, Congo-Brazzaville), Cellule Infrastructures RDC, GIZ Vergabemarktplatz. Playwright for JS-heavy portals: GCF Oracle Procurement Cloud, UNDP Procurement Notices, UNGM (UN Global Marketplace via POST API). Each source is monitored with a health indicator visible in the admin interface.
Automatic opportunity enrichment
For each kept opportunity, a secondary worker fetches the detail page (HTML + possible Terms of Reference PDF) every 4 minutes in batches of 8. Sections extracted via regex heuristic (no LLM): Overview, Components/Activities, Required Qualifications, Required Experts, Project Code, Financing Mode. E2E validation on an ISDB Cameroon opportunity (Batchenga-Ntui road): 6 sections correctly extracted + direct link to the PDF General Procurement Notice.
Operational transparency
The platform administration (internal) exposes: source health over 7 days (success rate, opportunities captured, top classifier rejections), per-source dry-run audit (re-scrape without touching the database to identify rejected items and why), user feedback on classification (false positive / false negative). Each scan_run keeps a sample of rejected items (max 30 per category) for retrospective diagnosis. This transparency is the cornerstone of continuous classifier improvement.