I'm working on a script that processes an Excel file containing product information. The data includes both product codes (which are 5-digit numeric values) and quantities, but these values are sometimes in the same cells. I need to distinguish between product codes and quantities correctly. The issue is complicated because sometimes the quantity is also a 5-digit number, and product codes may appear in cells where quantities would be.
What I'm Trying to Do:
I have an Excel file with multiple sheets. Each sheet contains product data, and the product codes and quantities are in different columns but often appear in the same cells for some rows. I want to extract the product codes and quantities correctly. Product codes are 5-digit numbers, while quantities are numeric and can also sometimes be 5-digit numbers (e.g., 1.5, 100, or 12345).
The issue arises because I’m using a function to identify 5-digit product codes, but in some cases, this function mistakenly identifies quantities as product codes, especially when the quantity is also a 5-digit number.
The Issue:
The problem is that product codes are sometimes mixed with quantities in the same cell, and some quantities are also 5-digit numbers. My current approach is checking for 5-digit numbers and assuming those are product codes, but this leads to mistakes when the quantity is also a 5-digit number.
What I've Tried:
Here's the script I’ve written to extract product data:
import pandas as pd
file_path = "C:\\Users\\user\\Downloads\\afps1.xlsx" # Update with your file path
xls = pd.ExcelFile(file_path)
# Extract company info
df_info = xls.parse("Co. Info", header=None)
company_info = {
"Company Name": df_info.iloc[13, 2] if len(df_info) > 13 else None,
"Mailing Address": df_info.iloc[14, 2] if len(df_info) > 14 else None,
"Plant Name": df_info.iloc[15, 2] if len(df_info) > 15 else None,
"Plant Address": df_info.iloc[16, 2] if len(df_info) > 16 else None
}
# Sheets to process
seafood_sheets = ["Wild Salmon", "Wild Fish & Shellfish", "CulturedSalmon&Fish", "Cultured Shellfish"]
data_rows = []
# Function to check if a value is a product code (5-digit numeric)
def is_product_code(value):
if pd.isna(value):
return False # Return False if value is NaN
return isinstance(value, (int, float)) and len(str(int(value))) == 5
for sheet in seafood_sheets:
df = xls.parse(sheet, header=None)
current_species_left = None
current_species_right = None
for idx, row in df.iterrows():
left_col = str(row[0]).strip() if not pd.isna(row[0]) else ""
right_col = str(row[5]).strip() if len(row) > 5 and not pd.isna(row[5]) else ""
# Detect species section headers
if left_col.isupper() and " - " in left_col:
current_species_left = left_col
continue
if right_col.isupper() and " - " in right_col:
current_species_right = right_col
continue
# Process left table
if current_species_left and left_col:
product_code = row[2] if not pd.isna(row[2]) and is_product_code(row[2]) else None
quantity = row[3] if not is_product_code(row[3]) and not pd.isna(row[3]) else None
value = row[4] if not pd.isna(row[4]) else None
if quantity or value:
data_rows.append({
"Species": current_species_left,
"Product Code": product_code,
"Product": left_col,
"Quantity": quantity,
"Value": value,
"Company Name": company_info["Company Name"],
"Mailing Address": company_info["Mailing Address"],
"Plant Name": company_info["Plant Name"],
"Plant Address": company_info["Plant Address"],
"Source Sheet": sheet
})
# Process right table
if current_species_right and right_col:
product_code = row[7] if len(row) > 7 and not pd.isna(row[7]) and is_product_code(row[7]) else None
quantity = row[8] if len(row) > 8 and not is_product_code(row[8]) and not pd.isna(row[8]) else None
value = row[9] if len(row) > 9 and not pd.isna(row[9]) else None
if quantity or value:
data_rows.append({
"Species": current_species_right,
"Product Code": product_code,
"Product": right_col,
"Quantity": quantity,
"Value": value,
"Company Name": company_info["Company Name"],
"Mailing Address": company_info["Mailing Address"],
"Plant Name": company_info["Plant Name"],
"Plant Address": company_info["Plant Address"],
"Source Sheet": sheet
})
# Convert to DataFrame and save
final_df = pd.DataFrame(data_rows)
output_file = "processed_seafood_data.xlsx"
final_df.to_excel(output_file, index=False)
What I Need Help With:
How can I more accurately distinguish between product codes and quantities when they are both numeric and sometimes both appear as 5-digit numbers?
Is there a better method or logic I can use to identify these values correctly?
Any help would be greatly appreciated! Thank you in advance.
See image of excel doc
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744228880a4564145.html
评论列表(0条)