what is gst merger?
gst merger is a solution I built at PragetX for a client dealing with the tedious task of merging multiple gst (goods and services tax) excel reports. having worked closely with finance teams, i understood the challenge of manually consolidating monthly reports into comprehensive summaries. this tool automates that entire process, saving hours of manual work and eliminating human error.
why i built it
dealing with gst reports can be overwhelming:
- multiple excel files for different months
- complex data structures with merged cells
- need for consistency across reports
- time-consuming manual consolidation
we wanted to create a solution that would make this process effortless for both technical and non-technical users. more details on a similar solution can be found in the official invoice utility case study.
key features
smart header detection
one of the most challenging aspects was detecting headers in gst reports, as they often contain merged cells and multi-level headers. i implemented a clever algorithm that:
def find_headers(xlsx_files):
if len(xlsx_files)>0:
wb = openpyxl.load_workbook(xlsx_files[0])
other_wbs = [ openpyxl.load_workbook(xlsx_file) for xlsx_file in xlsx_files[1:] ]
sheets = wb.sheetnames
for sheet in sheets:
merged_cells = get_sorted_merged_cells(wb, sheet)
max_row, max_col = wb[sheet].max_row, wb[sheet].max_column
header_num_rows = -1
# intelligently detects where headers end by comparing values across files
seamless merged cell handling
handling merged cells was tricky as they needed to be preserved while combining data. i developed a robust solution:
def get_merged_cell_from_coord(row1, col1, row2, col2):
return get_merged_cell_str(convert_to_coord(row1, col1), convert_to_coord(row2, col2))
def is_merged_cell(merged_cells, cell):
for index in range(len(merged_cells)):
if (cell.row >= merged_cells[index].min_row and
cell.row <= merged_cells[index].max_row and
cell.column >= merged_cells[index].min_col and
cell.column <= merged_cells[index].max_col):
return True, index
return False, -1
automatic column width optimization
to make the output excel readable, i implemented smart column width adjustment:
for idx, col in enumerate(sheet1):
series = sheet1[col]
max_in_series = series.astype(str).map(len).max()
max_len = min(max(max_in_series, len(str(series.name))) + 3, 50)
writer.sheets[sheets[0]].set_column(idx, idx, max_len)
user interface
i chose gradio for creating the interface because:
- it’s lightweight yet powerful
- provides an intuitive drag-and-drop interface
- works seamlessly in web browsers
- easily deployable as a desktop application
here’s how i styled the interface for a professional look:
with gr.Blocks() as demo:
gr.Markdown("<h1 style='text-align: center;'>GST Merger</h1>" +
"<h2 style='text-align: center;'>Upload Excel Files to Merge</h2>")
files = gr.Files(label="Upload Documents and Medical Reports",
type="filepath",
file_types=["xlsx"])
how to run
from source
- clone the repository
- create a virtual environment:
python -m venv .venv
source .venv/bin/activate # on linux/mac
# or
.venv/Scripts/activate # on windows
- install dependencies:
pip install -r requirements.txt
- run the application:
python main.py
as executable
i made the application distributable as a standalone executable, eliminating the need for python installation:
- generate spec file:
pyi-makespec --collect-data=gradio_client --collect-data=gradio --onefile main.py
- modify spec for gradio compatibility:
a = Analysis(
['main.py'],
...
module_collection_mode={
'gradio': 'py', # collect gradio as source files
},
)
- create executable:
pyinstaller main.spec
technical insights
threading for better ux
i implemented threading to ensure the ui remains responsive:
if __name__ == "__main__":
demo_thread = threading.Thread(target=demo.launch)
demo_thread.start()
webbrowser.open('http://localhost:7860')
demo_thread.join()
efficient data processing
- used pandas for high-performance data manipulation
- implemented smart caching of excel workbooks
- optimized memory usage for large files
- preserved excel formatting and merged cells
where to use it
this tool is perfect for:
- accountants handling multiple gst reports
- businesses with multiple gst registrations
- finance teams consolidating monthly reports
- anyone dealing with bulk gst excel merging
future enhancements
i’m planning to add:
- support for more gst report formats
- automated error detection in reports
- custom template support
- batch processing capabilities
- export in multiple formats
conclusion
this project showcases how automation can significantly improve workflow efficiency. by focusing on user experience and robust data handling, we created a tool at pragetx that transforms a tedious task into a simple drag-and-drop operation. you can learn more about our work on similar solutions through our case study and client testimonials by Mr. D. Thakkar.
Links :
Tags :
Date : 7th April, Monday, 2025, (Wikilinks: 7th April, April 25, April, 2025. Monday)
Category : Others