Article
Generative AI Reasoning: Tackling Complex Queries

As organizations grow in size and complexity, it is essential for business leaders to leverage the power of automation to enable timely, data-driven decision-making. Instead of spending engineering resources building a traditional dashboard full of data, we set out to test our hypothesis around the potential value of generative AI reasoning: “Can AI help me answer data-related questions simply by having a conversation about it?”
We’ve launched a generative AI project focused on translating natural language processing (NLP) questions into SQL queries, offering data-driven answers through machine intelligence. For precision, we include the entire database schema in the prompt instead of vectorization, which can cause issues by treating the schema as searchable data. We aim to refine AI applications within our processes by leveraging automation and intelligent systems.
In this post, you’ll find an overview of:
- Artificial intelligence reasoning types
- AI reasoning models to consider
- The AI reasoning model that was most appropriate for our use case
Exploring Artificial Intelligence Reasoning Types and the o3-mini Model
Traditionally, human reasoning was required to data analysis and producing comprehensive reports for guiding business decision-making. Now, reasoning is shifting toward artificial intelligence, which understands the questions we need answered. Generative AI models are evolving from giving plausible-sounding answers to providing responses grounded in truth. In this blog, we’ll focus on how these advances in Generative AI reasoning are transforming data analysis.
Not all Generative AI models are equal. Only certain models possess the advanced reasoning and automation capabilities necessary to generate valid SQL queries. Understanding these differences is crucial for selecting the right model and assessing its performance.
Generative AI and LLMs: Inherent Reasoning Types
Generative artificial intelligence and Large Language Models (LLMs) primarily use probabilistic reasoning and inductive reasoning.
- Probabilistic Reasoning: Probabilistic reasoning involves making inferences based on the likelihood of certain events or outcomes, which is useful in managing uncertainty and variability in data sets and data analysis. It refines AI algorithms and their natural language processing, problem-solving, pattern-recognition, and automation capabilities by quantifying uncertainty.
- Inductive Reasoning: Inductive reasoning draws generalized conclusions from specific observations. This helps our artificial intelligence algorithms with pattern recognition, natural language processing, knowledge representation, and forecasting abilities, enhancing adaptability and efficiency in query generation.
Non-Inherent Reasoning Types
LLMs can simulate other computational reasoning types but don’t inherently use them:
- Causal Reasoning: Causal reasoning refers to cause-and-effect relationships, vital for AI algorithms to determine variable impacts, and to build intelligent systems capable of this type of processing.
- Abductive Reasoning: Abductive reasoning refers to forming hypotheses from incomplete data, useful in AI applications and AI algorithms for scenario deduction.
- Deductive Reasoning: Deductive reasoning applies general rules to specific cases, enabling AI algorithms to prevent logical errors within AI settings.

The Reasoning of OpenAI o3-mini
While o3-mini simulates other computational reasoning forms, its core operations remain probabilistic and inductive. Its design supports computational reasoning, suitable for STEM tasks in intelligent systems, enhancing data handling abilities via:
- Inductive Reasoning: Ideal for drawing general conclusions from specific instances.
- Probabilistic Reasoning: Utilizes statistical pattern recognition from training data.
Additional features include chain-of-thought reasoning (step-by-step problem-solving) and deliberative alignment for enhanced decision-making. These contribute to machine intelligence, boosting comprehension and reliability.
AI Models: A Comparison of Reasoning Capabilities
An exemplary use case in our Generative AI reasoning experiments is briefly elaborated below:
- What types of data are they?
- What complex query problem-solving challenges are we trying to tackle?
- What would be the advantage of using the LLM computational reasoning approach instead of the conventional SQL rule-based queries?
The problem we aim to address is that rather than implementing new features or user interfaces for problem-solving, we recognize that some user requests are rare or only needed for one-time use. Therefore, to create truly intelligent systems, we want to use AI and machine intelligence to empower users to generate dynamic SQL queries based on their specific needs. This approach allows us to evaluate user requests before deciding whether or not to implement them.
We examined several models to explore the concept of employing Large Language Models (LLMs) for deep learning to generate unrestricted SQL queries for data insights. The models evaluated included GPT-4o, o1, Gemini 1.5 Pro, Gemini 2.0 Flash Experimental, and Anthropic Claude 3.5 Sonnet.
In production, we rely on OpenAI’s o1 to generate SQL queries because of its quality and consistency, and we use Gemini 2.0 Flash Experimental to convert the query results into answers because of the speed and output token. Neural networks and cognitive computing support our AI models in logical processing, ensuring robust automation in our operations.
Our evaluation of the o3-mini involved using our two-part system to handle a challenging question requiring the AI model to create a complex query. The process involves:
- Crafting an SQL query based on the user’s inquiry from the provided database schema.
- Converting the query results into a clear and understandable answer for the user.
Our guiding question and the computational reasoning effort of various AI models involved abductive reasoning to generate a markdown table with a breakdown by user to compare total forecasted hours and actual hours on the specific project in December 2024. Implementing hybrid reasoning and causal reasoning allows the AI model to refine its inferential capabilities significantly.

Reasoning capabilities of o3 Mini Model
For the o3-mini model, you must specify the level of reasoning effort, which will influence both the quality of the answers and the response speed. A higher effort level enhances the answer quality but also results in longer response times.
High Reasoning (slower but powerful)
- SQL: Query generation takes slightly longer (20-45 seconds) than o1 (around 20-30 seconds). Even with an example, incorrect queries are sometimes generated. There’s a 50% chance of getting a correct query without an example.
- Text: Converting database records to answers takes about the same time as o1, but the answer quality is much higher.
- In the summary section, the response from the o3-mini model is detailed and better summarized than o1, making the explanation easier to understand.
Medium Reasoning
- SQL: Queries are generated much faster (5-15 seconds) than o1, and with an example provided, no incorrect queries are produced. Without an example, queries often result in incorrect data analysis or errors, although simple queries (like summarizing time entries) are handled correctly.
- Text: The time taken to convert database records is similar to Gemini-2.0-flash-exp. Yet the quality of the answers is superior. However, it’s crucial to consider that the smaller input token capacity might present a potential challenge.
O3-mini
- Input: 200,000 tokens
- Output: 100,000 tokens
Gemini-2.0-flash
- Input: 1,048,576 tokens
- Output: 8,192 tokens
Low Reasoning
Low reasoning was skipped, as Medium reasoning didn’t meet expectations. However, low reasoning might be suitable for simple problem-solving, similar to GPT-4o.
Determining the Most Appropriate Generative AI Reasoning Model for Our Use Case
High reasoning effort is suitable for research or processing tasks within AI applications that do not require fast response times. Medium reasoning effort is better suited for common tasks or those that require quick responses and do not involve large amounts of data, data analysis, and pattern recognition.
There are a couple of key technical assessment metrics we’re looking for in our use case: query response time, query generation consistency, token length capacity. Based on this small trial, the o3-mini model was not considered for our project at this moment because the high reasoning effort made query generation slower and inconsistent use of example queries, and we would prefer a longer token length capability of the model if using the medium reasoning model.
Going forward, we either need to do more extensive trials on the o3-mini with extra prompt optimization work, or we’ll look for other tooling for better performance. We decided to wait for the GPT-4.5 or GPT-5 models for another trial in later time instead, with the trust that the AI frontier teams will make the LLM model better and sooner. Besides building Generative AI applications centered around the latest LLMs, our endeavor includes explainable AI to enhance inference engines, ensuring transparency and reliability in our AI applications and expert systems. The journey towards artificial general intelligence continues with intelligent systems, sophisticated pattern recognition, symbolic AI, and expert systems leading the way in solving complex problems.
Stay tuned, till next time.
Author’s Note: Special thanks to Steve Yin for his expertise and collaboration in producing this article.