Data wrangling with Excel and Pandas is actually quite useful tool in the belt of any Excel professional, financial professional, data analyst or a developer. Really, everyonecan benefit from the well defined libraries that ease people’s lifes. These are the libraries used:
|
import pandas as pd # Main data manipulation
from openpyxl import Workbook # Excel writing
from openpyxl.styles import Font # Excel formatting (bold, colors)
import glob # File path handling
from datetime import datetime
|
Additionally, a function for making a unique Excel name is used:
|
def make_unique_name():
timestamp = datetime.now().strftime(‘%Y%m%d_%H%M%S’)
return f‘{timestamp}__report.xlsx’
|
An example of the video, where Jupyter Notebook is used.
In the YT video below, the following 8 points are discussed:
# Trick 1 – Simple reading of worksheet from Excel workbook
|
excel_file_name = “financial_data.xlsx”
df = pd.read_excel(excel_file_name,
sheet_name = “Transactions”,
parse_dates = [“Date”],
dtype={“InvoiceID”:str})
|
# Trick 2 – Combine Reports
|
income = pd.read_excel(excel_file_name, sheet_name=“Income”)
expenses = pd.read_excel(excel_file_name, sheet_name=“Expenses”)
combined = pd.concat([
income.assign(From_Worksheet=“Income”),
expenses.assign(From_Worksheet=“Expenses”)
])
|
# Trick 3 – Fix Missing Values
|
combined[“Amount”] = combined[“Amount”].fillna(combined[“Amount”].mean())
|
# Trick 4 – Formatting the exported Excel file
|
with pd.ExcelWriter(new_worksheet, engine=“openpyxl”) as writer:
combined.to_excel(writer, index=False)
workbook = writer.book
worksheet=writer.sheets[“Sheet1”]
for cell in worksheet[“1:1”]:
cell.font = Font(bold=True)
cell.font = Font(color=“FFFF22”)
|
# Trick 5 – Merging Excel Files
|
files = glob.glob(“sales12/sales_*.xlsx”)
annual_data = pd.concat([pd.read_excel(f) for f in files])
|
# Trick 6 – Smart Filtering
|
web_design_only = annual_data[
(annual_data[“Description”]==“Web Design”
)]
small_transactions = annual_data[
(annual_data[“Amount”] < 200
)]
|
# Trick 7 – Mergining Tables
|
df_transactions = pd.read_excel(
excel_file_name,
sheet_name=“Transactions”)
df_customers = pd.read_excel(
excel_file_name,
sheet_name=“Customers”)
merged = pd.merge(
df_transactions,
df_customers,
on = “CustomerID”
)
|
# Trick 8 – Export Dataframe to Excel
|
with pd.ExcelWriter(new_worksheet, engine=“openpyxl”) as writer:
merged.to_excel(writer)
|
The whole code with the Excel files is available in GitHub here.
https://www.youtube.com/watch?v=SXXc4WySZS4
Enjoy it!