Summary and Setup
After following this two-day lesson, learners will be able to:
- identify key open energy data sources suitable to answering energy research questions
- read in tabular data from XML, JSON, and Parquet formats using
pandas - request data stored in cloud buckets
- request data from a variety of Application Programming Interfaces (APIs)
- scrape data from webpages using
beautifulsoup - visualize data to quickly understand patterns and anomalies
- write Python classes and functions to break down complex cleaning tasks into reusable and discrete steps
- write automated tests to ensure that their code works as expected
- troubleshoot performance issues, and handle data that is too large to fit in memory
- automatically detect unexpected values in inputs and outputs by writing data validation tests
- transform a local codebase into a collaborative project using Github repositories, code documentation, and virtual environments.
Setup
You’ll need a few things set up before starting the course:
- the course materials should be downloaded to your computer
- the Python libraries used in the course should be installed
We’ll assume you have some familiarity with the command line:
- you can run commands
- you can navigate directories with
cd - you can inspect directories with
ls(ordirif you’re using Command Prompt on Windows)
Getting the course materials
The course materials are hosted on GitHub and you’ll need
git installed to access them. Fortunately git
is free!
Try opening a terminal window and running:
If you get some sort of “command not found” error, follow the official installation instructions.
Once you have git installed, open a terminal window and
download a copy of the materials using git clone.
Let’s assume you have a courses directory that you want
to store the course materials under. This will download the materials to
course/open-energy-data-for-all.
BASH
% cd courses
courses/ % git clone https://github.com/catalyst-cooperative/open-energy-data-for-all.git
If you open the open-energy-data-for-all directory you
just made, you should be able to see the course materials.
Installing the Python libraries
To install the Python libraries this course depends on, you will need
uv.
If you don’t have uv installed, check out their official
installation documentation.
Once you’ve installed uv, you can use it to install the
Python libraries into an isolated environment only for this course.
-
Using a terminal, enter the course repository you downloaded above:
-
Install the libraries:
-
Test out to see if the dependencies were installed by opening a Jupyter notebook:
You should see a directory listing in your browser:
A directory listing, showing the contents of the course repository.Click on
/notebooks, and then double-click the00-test-installation.ipynbnotebook. You should see a single cell:
A cell that imports the high-level dependencies we need in the course.Run that cell - if it doesn’t print
Success!, then you’re missing some dependencies for the course.Double-check:
- that you ran
jupyter notebookwithuv run jupyter notebook - that the libraries that are imported are listed in
pyproject.toml
If both are true, contact your instructor for help.
- that you ran