Python is one of the most popular programming languages in the world — and for good reason. It’s readable, powerful, and used everywhere from data engineering to web development to AI. If you’re just starting out, this guide covers the Python basics you need to write real, working code from day one.
Why Python? A Quick Case for Learning It
- Readable syntax — Python code reads almost like plain English
- Versatile — used in data engineering, automation, ML, web apps, scripting
- Huge ecosystem — thousands of libraries (pandas, boto3, pyspark, fastapi…)
- In-demand — consistently top 3 in developer surveys worldwide
Setting Up Python
Install Python from python.org (3.10+ recommended). Verify it works:
python3 --version
# Python 3.11.4
For an editor, VS Code with the Python extension is a solid choice.
Variables and Assignment
A variable stores a value. In Python, you don’t declare types — just assign:
name = "Ruby"
age = 30
is_engineer = True
pi = 3.14159
Python figures out the type automatically. You can reassign variables freely:
x = 10
x = "now a string" # totally valid in Python
Python Data Types
Python has several built-in data types you will use constantly:
| Type | Example | Notes |
|---|---|---|
int | 42 | Whole numbers |
float | 3.14 | Decimal numbers |
str | "hello" | Text, in quotes |
bool | True / False | Booleans |
list | [1, 2, 3] | Ordered, mutable |
tuple | (1, 2, 3) | Ordered, immutable |
dict | {"key": "value"} | Key-value pairs |
set | {1, 2, 3} | Unique values |
type(42) # <class 'int'>
type("hello") # <class 'str'>
type([1,2,3]) # <class 'list'>
Strings: The Basics
name = "Ruby"
# Concatenation
greeting = "Hello, " + name # "Hello, Ruby"
# f-strings (preferred in modern Python)
greeting = f"Hello, {name}!" # "Hello, Ruby!"
# Useful string methods
"python".upper() # "PYTHON"
" hello ".strip() # "hello"
"a,b,c".split(",") # ["a", "b", "c"]
len("hello") # 5
Lists: Ordered Collections
tools = ["Python", "SQL", "Spark"]
tools[0] # "Python"
tools[-1] # "Spark"
tools.append("dbt")
tools.remove("SQL")
tools[0:2] # ["Python", "Spark"]
for tool in tools:
print(tool)
Dictionaries: Key-Value Storage
engineer = {
"name": "Ruby",
"skills": ["Python", "AWS", "Spark"],
"years_experience": 5
}
engineer["name"] # "Ruby"
engineer.get("location") # None (safe)
engineer["location"] = "London"
for key, value in engineer.items():
print(f"{key}: {value}")
Control Flow: if / elif / else
score = 85
if score >= 90:
print("Excellent")
elif score >= 70:
print("Good")
else:
print("Needs work")
Python uses indentation (4 spaces) to define code blocks — no curly braces.
Loops
for loop
numbers = [1, 2, 3, 4, 5]
for n in numbers:
print(n * 2)
for i in range(5):
print(i) # 0 1 2 3 4
while loop
count = 0
while count < 3:
print(count)
count += 1
List Comprehensions (Pythonic Shorthand)
squares = [x**2 for x in range(10)]
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Functions
def greet(name):
return f"Hello, {name}!"
greet("Ruby") # "Hello, Ruby!"
Default Arguments
def connect(host, port=5432):
return f"Connecting to {host}:{port}"
connect("localhost") # uses port 5432
connect("prod-db", 3306) # overrides port
*args and **kwargs
def log(*args, **kwargs):
print(args)
print(kwargs)
log("error", "timeout", level="WARN", service="api")
Error Handling
try:
result = 10 / 0
except ZeroDivisionError as e:
print(f"Error: {e}")
finally:
print("Always runs")
Importing Modules
import os
import json
from datetime import datetime
# Third-party (install with pip)
import pandas as pd
import boto3
pip install pandas boto3
A Mini Project: Putting It All Together
def summarise_pipeline(jobs):
# Summarise a list of pipeline job results.
total = len(jobs)
succeeded = sum(1 for j in jobs if j["status"] == "success")
failed = total - succeeded
return {
"total": total,
"succeeded": succeeded,
"failed": failed,
"success_rate": f"{(succeeded / total) * 100:.1f}%"
}
jobs = [
{"name": "ingest_s3", "status": "success"},
{"name": "transform_spark", "status": "success"},
{"name": "load_redshift", "status": "failed"},
]
print(summarise_pipeline(jobs))
# {'total': 3, 'succeeded': 2, 'failed': 1, 'success_rate': '66.7%'}
FAQ: Python Basics
Is Python good for data engineering?
Yes — Python is the primary language for data engineering. Libraries like PySpark, pandas, SQLAlchemy, and boto3 are built around it.
Do I need to understand data types before writing Python?
Yes, even at a basics level. Knowing the difference between a list and a dict will save you hours of debugging.
What is the difference between a list and a tuple in Python?
A list is mutable (you can change it); a tuple is immutable (fixed after creation). Use tuples for data that should not change — like coordinates or config pairs.
What is an f-string in Python?
An f-string (formatted string literal) is the modern way to embed variables in strings. Prefix the string with f and wrap variables in {}: f"Hello, {name}".
How is Python different from other languages?
Python uses indentation for code blocks (not braces), has dynamic typing, and prioritises readability. It’s slower than C/Java but faster to write and maintain.
What should I learn after Python basics?
Focus on: file I/O, classes and OOP, exception handling patterns, then pick a domain — data engineering (pandas, PySpark), web (FastAPI), or automation (boto3, subprocess).
Wrapping Up
Python basics aren’t just for beginners — they’re the foundation every data engineer, ML engineer, and backend developer returns to. Nail variables, data types, loops, functions, and error handling, and you’ll be writing real, useful code quickly.
The best way to learn? Write code. Break things. Read error messages. Repeat.