what is gst merger?

gst merger is a solution I built at PragetX for a client dealing with the tedious task of merging multiple gst (goods and services tax) excel reports. having worked closely with finance teams, i understood the challenge of manually consolidating monthly reports into comprehensive summaries. this tool automates that entire process, saving hours of manual work and eliminating human error.

why i built it

dealing with gst reports can be overwhelming:

  • multiple excel files for different months
  • complex data structures with merged cells
  • need for consistency across reports
  • time-consuming manual consolidation

we wanted to create a solution that would make this process effortless for both technical and non-technical users. more details on a similar solution can be found in the official invoice utility case study.

key features

smart header detection

one of the most challenging aspects was detecting headers in gst reports, as they often contain merged cells and multi-level headers. i implemented a clever algorithm that:

def find_headers(xlsx_files):
    if len(xlsx_files)>0:
        wb = openpyxl.load_workbook(xlsx_files[0])
        other_wbs = [ openpyxl.load_workbook(xlsx_file) for xlsx_file in xlsx_files[1:] ]
        sheets = wb.sheetnames
        for sheet in sheets:
            merged_cells = get_sorted_merged_cells(wb, sheet)
            max_row, max_col = wb[sheet].max_row, wb[sheet].max_column
            header_num_rows = -1
            # intelligently detects where headers end by comparing values across files

seamless merged cell handling

handling merged cells was tricky as they needed to be preserved while combining data. i developed a robust solution:

def get_merged_cell_from_coord(row1, col1, row2, col2):
    return get_merged_cell_str(convert_to_coord(row1, col1), convert_to_coord(row2, col2))
 
def is_merged_cell(merged_cells, cell):
    for index in range(len(merged_cells)):
        if (cell.row >= merged_cells[index].min_row and 
            cell.row <= merged_cells[index].max_row and 
            cell.column >= merged_cells[index].min_col and 
            cell.column <= merged_cells[index].max_col):
            return True, index
    return False, -1

automatic column width optimization

to make the output excel readable, i implemented smart column width adjustment:

for idx, col in enumerate(sheet1):
    series = sheet1[col]
    max_in_series = series.astype(str).map(len).max()
    max_len = min(max(max_in_series, len(str(series.name))) + 3, 50)
    writer.sheets[sheets[0]].set_column(idx, idx, max_len)

user interface

i chose gradio for creating the interface because:

  • it’s lightweight yet powerful
  • provides an intuitive drag-and-drop interface
  • works seamlessly in web browsers
  • easily deployable as a desktop application

here’s how i styled the interface for a professional look:

with gr.Blocks() as demo:
    gr.Markdown("<h1 style='text-align: center;'>GST Merger</h1>" + 
                "<h2 style='text-align: center;'>Upload Excel Files to Merge</h2>")
    files = gr.Files(label="Upload Documents and Medical Reports", 
                    type="filepath", 
                    file_types=["xlsx"])

how to run

from source

  1. clone the repository
  2. create a virtual environment:
python -m venv .venv
source .venv/bin/activate  # on linux/mac
# or
.venv/Scripts/activate    # on windows
  1. install dependencies:
pip install -r requirements.txt
  1. run the application:
python main.py

as executable

i made the application distributable as a standalone executable, eliminating the need for python installation:

  1. generate spec file:
pyi-makespec --collect-data=gradio_client --collect-data=gradio --onefile main.py
  1. modify spec for gradio compatibility:
a = Analysis(
    ['main.py'],
    ...
    module_collection_mode={
        'gradio': 'py',  # collect gradio as source files
    },
)
  1. create executable:
pyinstaller main.spec

technical insights

threading for better ux

i implemented threading to ensure the ui remains responsive:

if __name__ == "__main__":
    demo_thread = threading.Thread(target=demo.launch)
    demo_thread.start()
    webbrowser.open('http://localhost:7860')
    demo_thread.join()

efficient data processing

  • used pandas for high-performance data manipulation
  • implemented smart caching of excel workbooks
  • optimized memory usage for large files
  • preserved excel formatting and merged cells

where to use it

this tool is perfect for:

  • accountants handling multiple gst reports
  • businesses with multiple gst registrations
  • finance teams consolidating monthly reports
  • anyone dealing with bulk gst excel merging

future enhancements

i’m planning to add:

  • support for more gst report formats
  • automated error detection in reports
  • custom template support
  • batch processing capabilities
  • export in multiple formats

conclusion

this project showcases how automation can significantly improve workflow efficiency. by focusing on user experience and robust data handling, we created a tool at pragetx that transforms a tedious task into a simple drag-and-drop operation. you can learn more about our work on similar solutions through our case study and client testimonials by Mr. D. Thakkar.


Links :

Tags :

Date : 7th April, Monday, 2025, (Wikilinks: 7th April, April 25, April, 2025. Monday)

Category : Others