
In recent years, we’ve worked with various Large Language Models (LLMs), experiencing firsthand how quickly this technology evolves and the opportunities that arise around it. In this article, we’d like to share some lessons we’ve learned from our experience, covering everything from cost evaluation to prompt engineering and fine-tuning techniques.

1. Understanding the Relationship Between Cost and Performance
At first, we were surprised by the wide range of model “sizes” offered by different providers. On one hand, there are lighter models, which tend to be faster and cheaper but also “less intelligent.” On the other, there are larger models with greater reasoning capacity, although they come with higher costs and slower performance.
We discovered that the cost/performance ratio can vary dramatically. For example, in our early days, GPT-3 achieved a certain score on MMLU at a price of $60 per million tokens. Only three years later, we saw much cheaper models offering the same score at just $0.06 per million tokens.
This has been possible thanks to improvements in model architecture, compression and distillation techniques, as well as advances in hardware. We’ve also noticed the emergence of new competitors, like DeepSeek, which offer open-source solutions capable of delivering highly competitive performance at lower cost.
2. Planning for the Future (But Moving Forward in the Present)
Observing how quickly both costs and model capabilities change, we learned not to limit ourselves to the cheapest model that might serve our needs at a given moment. However, we also needed to process millions of tokens a day and show results as soon as possible, which forced us to choose more affordable models at that time.
We found that the most sensible strategy was to move forward with whatever worked best right now, while also preparing our infrastructure to integrate new models as they became more cost-effective or powerful. In fact, more than once we encountered a situation where something previously unviable suddenly became the best option just a few months later.
3. Modular and Agnostic Systems: Implementation Strategies
To manage costs without sacrificing quality, one tactic that worked well for us was to break down in detail the various tasks we wanted the LLM to perform. This allowed us to chain together requests to several lighter, specialized models. Although this required extra development effort, it gave us greater control and a better balance between cost and performance.
Likewise, having a system that was model-agnostic was crucial, so we could easily switch from one LLM to another when a cheaper or more efficient option appeared. This approach let us adopt new offerings quickly without needing to rebuild our entire architecture.
4. Fine-Tuning vs. Prompt Engineering: Practical Lessons
Another recurring question was whether it was worth applying fine-tuning or if careful prompt engineering would suffice.
Fine-Tuning
Adapting a specific model with your own data can be very beneficial when you need it to handle something highly specialized and you have high-quality information.
In our case, it was especially useful for formatting documents with HTML tags, where we needed a very specific outcome for each client.
However, it requires time and resources, and it can become obsolete if a cheaper generic model with similar capabilities appears.
Prompt Engineering
This involves refining the instructions given to the model to guide its behavior.
It was particularly helpful in “pushing” somewhat less powerful models to perform complex tasks, thus optimizing the cost-efficiency ratio.
Since it’s not tied to any specific model, it was easier to reuse our work if we decided to switch providers.
In short, fine-tuning offers highly focused precision in certain specific cases, while prompt engineering provides more flexibility at a lower initial cost.
5. Staying Agile and Reassessing Decisions
An important lesson was the willingness to revisit and abandon some previous developments when new solutions emerged that better suited our needs. Although it might seem like a step backward, our experience has shown that sometimes undoing part of what’s already built is the quickest way to evolve.
Practical Example
Before: To extract information from PDFs, we split the document into multiple sections and used lighter models to process each part.
Now: With the arrival of models like “gpt4o-mini,” which are both more affordable and more powerful, we can send the entire PDF in each request to obtain more precise answers, eliminating much of the preliminary processing.
Deciding to abandon our embeddings-based system wasn’t easy, but in the long run, it turned out to be simpler and more effective.
6. Looking Ahead: Agents and Open-Source Models
There’s been a lot of talk lately about the “era of agents,” a new paradigm in which multiple models and tools can be orchestrated to perform more complex tasks with some level of autonomy. From our perspective, this could mean restructuring applications to coordinate various intelligent components.
Meanwhile, the rise of increasingly powerful open-source models suggests that costs will continue to fall and the flexibility to customize these models will grow. We’re already considering how best to leverage the new options that appear practically every month.
Conclusion
After working extensively with various language models, we believe these are the key lessons we’ve learned:
Planning and Flexibility: Move forward with the viable options available today, while staying prepared to adopt future improvements.
Modular and Agnostic Architecture: Design systems that make it easy to switch models and take advantage of specialized functions.
Balancing Fine-Tuning and Prompt Engineering: The former offers highly targeted accuracy, while the latter allows more agility and less dependence on a single model.
Agility to Reevaluate: Don’t be afraid to let go of prior developments if a new approach can improve quality and reduce costs.
Staying Current: Agents and open-source models promise yet another shift in the sector, and it’s wise to be prepared.
In such a rapidly changing market, our biggest takeaway is that the ability to react quickly, keep learning, and remain open to adopting new solutions can make a major difference in staying competitive in the AI space.
Comments