| Student's name | SCIPER |
|---|---|
| Massimo Berardi | 345943 |
| Noam Ifergan | 341405 |
| Victor Nahoul | 339407 |
Milestone 1 • Milestone 2 • Milestone 3
10% of the final grade
This is a preliminary milestone to let you set up goals for your final project and assess the feasibility of your ideas. Please, fill the following sections about your project.
(max. 2000 characters per section)
Find a dataset (or multiple) that you will explore. Assess the quality of the data it contains and how much preprocessing / data-cleaning it will require before tackling visualization. We recommend using a standard dataset as this course is not about scraping nor data processing.
Hint: some good pointers for finding quality publicly available datasets (Google dataset search, Kaggle, OpenSwissData, SNAP and FiveThirtyEight).
We base our analysis on the FAO Detailed Trade Matrix dataset (1986–2024), compiled by the Food and Agriculture Organization of the United Nations (FAO). The data follow the standard International Merchandise Trade Statistics (IMTS) methodology and are mainly sourced from UNSD, Eurostat, and national authorities. For each pair of countries and year, the database reports export quantity, export value, import quantity, and import value for a wide range of food and agricultural products.
The dataset is designed for global coverage rather than completeness of every bilateral flow: many country pairs never trade a given product, which results in a relatively sparse matrix, but the reported figures are based on official national statistics and undergo consistency checks by FAO. This makes it well suited for high-level analyses of trade patterns and value chains.
To use the dataset for our project, we will have to narrow it down and reshape it. In particular, we will restrict the bulk download to coffee-related products only (for example keeping items such as “Coffee, green” and “Coffee, decaffeinated or roasted”), drop variables that are not relevant for our questions, and ensure that the remaining fields are consistent across years and reporters.
Our goal is to construct two working datasets: one for trade values and one for trade quantities. In both cases, we aim for a format where each row corresponds to a single, economically meaningful flow of coffee between two countries in a given year. This structure will allow us to interpret the global transformation chain, from raw beans to processed products.
Note. The original FAO bulk download is too large to be pushed to GitHub. However, the smaller preprocessed datasets resulting from these steps are included in the repository.
Frame the general topic of your visualization and the main axis that you want to develop.
- What am I trying to show with my visualization?
- Think of an overview for the project, your motivation, and the target audience.
The Hidden Coffee Chain. Who Really Transforms the Bean?
Coffee is one of the most traded commodities in the world, yet its supply chain remains deeply misunderstood. When observing only bilateral trade flows between countries, the picture becomes misleading.
Our visualization aims to reveal which countries are the true coffee transformers, those that import raw beans, process them through roasting, encapsulating, and packaging, and re-export them as higher-value products. To measure this, we will look at the difference between processed and raw coffee trade, and their respective values. This approach isolates the value genuinely added by each country. The central axis is transformation: tracing the journey from raw beans to finished products. Rather than simply mapping who sells to whom, we want to show where economic value is actually captured along the chain.
Our target audience is the general public and students interested in economics or sustainability, who question the North-South inequalities embedded in global value chains. The motivation is straightforward: behind every cup of coffee lies an economic geography.
Pre-processing of the data set you chose
- Show some basic statistics and get insights about the data
Starting from the full FAO table, we isolated coffee by keeping only “Coffee, green” and “Coffee, decaffeinated or roasted”. We removed redundant columns and split the data into two aligned tables for trade values and quantities, keeping only import and export flows.
We then assessed data completeness over time. While coverage is generally low (<45%) due to rare bilateral combinations, quality is high: official national statistics (flag A) account for ~98% of entries. Consequently, we dropped auxiliary source flags to focus on the primary series.
To avoid structural zeros and structurally empty records, we removed: (i) columns that are entirely missing, and (ii) rows with only zero or missing trade across all years. We also aligned the value and quantity tables so that they share the same subset of observations, and saved the resulting matrices for downstream analysis.
Finally, we produced exploratory plots of coffee trade quantities for specific countries. For Germany, Brazil, and Switzerland, we visualized (i) total imports vs exports over time, and (ii) a more detailed breakdown into raw vs processed coffee. These first visualizations already hint at the distinct functional roles of countries in the supply chain. For example, Brazil as a major exporter of green coffee, Germany as an important hub for importing and re-exporting (including processed coffee), and Switzerland as a high-value processing and re-export center. In our final plot, we explicitly contrast Switzerland's trade in physical quantities with trade values, highlighting how relatively modest volumes can translate into disproportionately high export value once coffee is processed and re-exported. Together, these patterns provide an initial empirical basis for our subsequent value-chain analysis of where coffee is processed and where value is captured.
All the steps are detailed in the EDA notebook: EDA.ipynb
- What others have already done with the data?
- Why is your approach original?
- What source of inspiration do you take? Visualizations that you found on other websites or magazines (might be unrelated to your data).
- In case you are using a dataset that you have already explored in another context (ML or ADA course, semester project...), you are required to share the report of that work to outline the differences with the submission for this class.
Several existing projects already visualize global coffee data, but most focus on country-level summaries or bilateral exchanges rather than modeling coffee as a transformation process along a global value chain.
A useful reference is the coffee_worldwide_ETL project, which combines choropleths and standard charts to explore global coffee production, consumption, and trade. It highlights key patterns such as the divide between producing and consuming countries, but treats geography mainly as a set of indicators rather than a system of flows.
A closely related project from previous years is the Sundial Coffee Visualization Project. It explores coffee trade with an emphasis on monetary flows and qualitative aspects such as aromas. While conceptually similar, it focuses on value representation, whereas our project targets structural transformation in the supply chain, identifying where raw coffee is processed and value is added.
The closest inspiration is Resource Trade Earth, which provides interactive flow maps of global trade. While it effectively shows bilateral exchanges, it does not capture the functional role of countries within the value chain. Our approach builds on such flows to distinguish producers, processors, and consumers, and to reveal where value is created and captured.
Academic work, such as Utrilla-Catalan et al. (2022), models coffee trade as a network and highlights structural inequalities, but remains focused on analysis rather than interactive storytelling.
Our contribution combines these perspectives while shifting the focus toward a central question: where is coffee actually processed, and who captures its value?
This dataset has not been used by our team before.
10% of the final grade
The second milestone report is available here: Milestone 2 Report, with the MVP available here: Milestone 2 MVP.
80% of the final grade
- < 24h: 80% of the grade for the milestone
- < 48h: 70% of the grade for the milestone