6,222 Introduction to Programming
In-class Exercise: Git and GitHub for Collaboration
Overview
Goal: By the end of this exercise, you and your partner will share a GitHub repository, each working on your own branch, with a merged pull request (PR) in the history. This is a typical workflow we would like you to use for your group project in this course, and you can also use it for your other group projects in R or Python or any programming language.
Work in pairs. Decide who is Student A (the repository owner) and who is Student B (the collaborator).
Phase 1: Create and Link a Repository
A
Student A
Create a repository and invite your collaborator
- Go to github.com and click New repository
- Name it
collab-exercise-week03, set visibility to Public - Add a README, but do not add a .gitignore file
- Click Create repository and copy the URL
- Invite Student B as a collaborator: Settings → Collaborators → Add people
- Clone the repository into exercises/week_03:
cd exercises/week_03
git clone <your repo URL>
cd collab-exercise-week03Note: Using the SSH URL requires a configured SSH key. If SSH is not set up yet, use the HTTPS URL, which works for public repositories without a key.
B
Student B
Accept the invitation and clone
- Accept the collaborator invitation (check GitHub notifications, email, and spam)
- Clone the repository into exercises/week_03:
cd exercises/week_03
git clone <Student A's repo URL>
cd collab-exercise-week03Phase 2: Set Up the Environment
In this part, we will set up the environment and generate a dataset using a simple Python script. The script generate_data.py creates a CSV file with GDP data for a few countries:
import pandas as pd
data = pd.DataFrame({
"country": ["Switzerland", "Germany", "France", "Italy", "Spain"],
"gdp_bn_usd": [800, 4000, 2800, 2100, 1400],
"year": 2023
})
data.to_csv("gdp_data.csv", index=False)
print("File written: gdp_data.csv")A
Student A
Create a virtual environment and add the data script
- In the repository, create a virtual environment and install
pandas:
uv init --python 3.13
uv add pandasDownload generate_data.py (available on Canvas) and copy it into collab-exercise-week03
Using the terminal or a text editor, create a .gitignore that excludes the virtual environment. In the terminal, you can use
echolike this:
echo ".venv/
__pycache__/
*.pyc" > .gitignoreThe .gitignore file should now look like this:
.venv/
__pycache__/
*.pycNote:
>creates the file (or overwrites it if it already exists).>>(used later) appends to an existing file.
- Commit and push:
git add .
git commit -m "Add virtual environment and data generation script"
git pushB
Student B
Sync the environment and generate the data
- Pull Student A’s changes and sync the virtual environment:
git pull
uv syncNote:
uv syncrecreates the virtual environment from uv.lock, reproducing the exact same environment that was started by Student A. You don’t need to runuv inityourself.
- Run the data generation script, either directly with
uv runor from VS Code by selecting the right Python interpreter (like in the first exercise session):
uv run python generate_data.py- gdp_data.csv is now generated locally. Remember from the lecture: generated files should not be tracked. Add it to .gitignore and push:
echo "gdp_data.csv" >> .gitignore
git add .gitignore
git commit -m "Ignore generated CSV file"
git pushA
Student A
Pull the changes
git pullCheckpoint: At this point, both students have generate_data.py and .gitignore. The CSV is generated locally but not tracked by Git. The .gitignore file is in sync.
Phase 3: Branch and Pull Request
B
Student B
Create a branch and extend the dataset
- Create a new branch:
git switch -c add-population-data- Open generate_data.py and add a
population_mncolumn:
data = pd.DataFrame({
"country": ["Switzerland", "Germany", "France", "Italy", "Spain"],
"gdp_bn_usd": [800, 4000, 2800, 2100, 1400],
"population_mn": [8.7, 84.4, 68.2, 59.0, 47.4],
"year": 2023
})
data.to_csv("gdp_data.csv", index=False)- Commit and push the branch:
git add generate_data.py
git commit -m "Add population data to analysis"
git push -u origin add-population-data- Open a Pull Request on GitHub:
- GitHub will show a banner: “add-population-data had recent pushes”, click Compare & pull request
- Title:
Add population data to GDP analysis - Write a short description of what you changed and why
- Click Create pull request
A
Student A
Review the pull request
- Open the Pull Request on GitHub and go to the Files changed tab. It gives you an overview of all the changes made on the branch that wants to be pulled. Check that only generate_data.py was modified.
- Leave a review comment requesting one more change:
“Great work! Before we merge, can you add a
gdp_per_capita_usdcolumn? It should be GDP in billions of USD divided by population in millions, scaled to USD per person.”
B
Student B
Address the review
Add the column to generate_data.py:
data["gdp_per_capita_usd"] = data["gdp_bn_usd"] / data["population_mn"] * 1000Commit and push. The PR updates automatically:
git add generate_data.py
git commit -m "Add GDP per capita column"
git pushA
Student A
Approve and merge
- Check the updated Files changed tab and verify the formula is correct
- Approve and click Merge pull request
Both students sync main:
git switch main
git pullCheckpoint: Student B’s changes are in main. Both students are in sync. The full collaborative workflow is complete.
Ensuring that your feature/working branch is up-to-date with the main branch (self-study)
When working in a team, the main branch changes frequently. Before continuing your work (or opening a pull request), you should update your feature branch to align with main.
There are two safe and common ways to do this.
This part is more advanced and not required for the course. It is however an important workflow to understand and master for any collaborative project.
Option 1: Merge main into your feature branch
Make sure you are on your feature / working branch
git switch <feature-branch>
# Download newest changes from GitHub. This updates your local copy of main but does not change your branch yet.
git fetch origin
# Merge main into your branch. Now your branch contains all changes from main
git merge origin/main
# If there are conflicts, Git will tell you. Fix them, then run:
git add .
git commitOption 2: Rebase your branch onto main
This keeps the commit history linear and clean, but is slightly more advanced.
git switch <feature-branch>
# Download newest changes from GitHub. This updates your local copy of main but does not change your branch yet.
git fetch origin
# rebase onto main
git rebase origin/main
# If there are conflicts, Git will tell you. Fix them, then run:
git add .
git rebase --continue
# repeat (Fix -> git add -> git rebase --continue) until Git says something like "Successfully rebased and updated <feature-branch>."Summary
You have practised the core collaborative Git workflow:
| Step | Command(s) |
|---|---|
| Clone an existing repo | git clone |
| Ignore generated files | .gitignore |
| Sync a virtual environment | uv sync |
| Get a collaborator’s changes | git pull |
| Work on a branch | git switch -c, git push -u origin <branch> |
| Contribute via PR | GitHub web interface |
From now on, this is the workflow you will use for your group project: each team member works on their own branch and contributes changes via pull requests.