I'm sure you've all seen the recent chaos that DeepSeek has caused in the US market over the past few weeks. Despite being developed by a much smaller team with a fraction of the funding, DeepSeek showed the world that significant resources aren't required to produce a market leading product and on the other hand Grok 3 using 200,000 Nvidia H100 GPUs costing an unfathomable amount of money and going down the opposite direction, focusing on state of the art technology and how advanced it can be made.
Wiping out nearly a trillion dollars of market value among the top US firms and sending shock waves through Nvidia specifically (DeepSeek uses their chips), dropping the value by 17% (close to $600 Billion) because investors began to freak out as they realised these models don't need the significant investment to be created. DeepSeek claimed it took just two months and cost under $6 million to bring the app to market for what we see it as now.
DeepSeek is built on a system that is divided up into smaller sub models where each sub model is activated when a particular piece of knowledge is relevant or bought up, its using smaller parts of the system that are categorised into specific areas rather than the whole model being used to carry out the desired task, its like having specific teachers for specific subjects rather than having one person who teaches you everything.
And then came Grok 3 Mid February, this article was intended to be published within a week of DeepSeek being launched, however, due to DeepSeek restricting any API access we couldn't publish our integration so instead we went with Grok 3. Still demonstrating our flexibility and adaptability to the environment and showcasing our ability to bring another app (Grok) to the marketplace in such a short period of time.
DeepSeek and Grok (Grok plans to release the newest previous version) are open source models, meaning anyone can use it (publicly accessible) without having to pay for a license. Its API (Application Programming Interface, which allows two different applications to talk to each other) is priced 20 times cheaper than the ChatGPT API, making it accessible to pretty much everyone without sacrificing any features or capabilities found in paid models. Plus, the source code is available for everyone to see and use for themselves.
Open source software comes with several advantages, the biggest being cost effectiveness, it’s typically free and doesn’t require licensing fees. Since the code is open to the public, developers can modify and customise it to fit their specific needs. This openness also encourages community collaboration, where users contribute to documentation, bug fixes, and overall improvements to models.
That said, open source models have some drawbacks. One major downside is the lack of dedicated technical support. Instead, users rely on the community for troubleshooting, which can be tricky for those without technical experience. Open source models also might not always include the exact features users need, meaning customisation is often required but can still be done.
On the other hand, closed source software is where the source code is private and only accessible to those who pay for a license and it comes with its own advantages. It typically offers expert technical support, predefined features, and stronger security measures to prevent attacks, It’s also designed to integrate easily with existing systems making the user experience straightforward.
However, closed-source software has its downsides too. It’s expensive, users rely on the vendor for updates and long-term support, and there’s no way to modify it to fit specific needs.
We then created three different blog posts, each written by a different model, using the same prompt and data to see which one would deliver the best response.
Features |
DeepSeek |
ChatGPT |
Gemini |
Grok 3 |
Claude 3.7 |
Developer |
DeepSeek AI |
OpenAI |
Google DeepMind |
X |
Anthropic AI |
Model Type |
Open source LLM |
Proprietary LLM |
Proprietary Multimodule model |
Family of LLMs and LRMs |
LLM |
Architecture |
DeepSeek-V3 Mixture of Experts (MoE) |
Transformer based (GPT-4) |
Multimodal Transformer |
- |
"hybrid reasoning" |
Performance |
Efficient, cost effective, deep think |
Strong in simple tasks and text coherence. Great writer |
Advanced multimodal capabilities |
Expert in nuanced analysis, deep domain expertise, and strategic insights |
Coding Prowess, reasoning and math, speed |
Strengths |
Coding, technical tasks, showing the working and reasoning |
Best for conversational AI, coding assistance and creative writing |
Best for image, video, audio and text processing |
"Real time" DeepSearch , complex high level code, advanced reasoning |
Hybrid reasoning, coding excellence, visible reasoning, context handling, practical focus |
Limitations |
Multimodal capabilities (Text, image and audio and convert into any output type) Training data only goes up to October 2023, no memory retention, no real time learning |
Limited real time capabilities, dependent on training data, responses are based on learned behavior (Jun 2024 knowledge cut off) |
Based on the dataset it was trained on, “may have bias issues in complex reasoning”. No real time access (Aug 2024 knowledge cut off) |
Humor, low quality images, over sensitive to ethical dilemmas, limited real world testing |
Pricier than others, reasoning faithfulness, not real time, conservative edge |
Pricing |
Free and open source |
GPT-4o is Free, $20/m limited access, $200/m unlimited access |
Gemini 2.0 flash is free, Advanced $20/m |
Grok 3 accessed through X Premium+ $22/m $229/year |
Free Tier with restrictions, Claude Pro $20/m |
We played around with each of the apps and with a series of different use cases we inputted the same prompt into each model to see what the different outcomes would be.
Essentially the integration we built out allows you to send a message to the AI model (ChatGPT, Gemini, DeepSeek or Grok) using data from your CRM record to update a specified property. You can ask it or input anything you normally would while using (ChatGPT, Gemini, DeepSeek or Grok) and it will give you a response on any desired property. E.g. a custom property on a company record, the idea is to utilise these models to help make the most of your HubSpot experience.
We tested a handful of use cases among each of the models to see the difference and variation of outputs and come up with a conclusion to decipher which model would be most appropriate to use when integrating it into your workflows as well to see which model would be best use for different scenarios.
We then created 3 different blog posts made by each individual model to see which one would give the best response for talking about the outputs of the prompts given to them and then inputted each 3 of them into each model to ask which one they thought was best.
Here's the Prompt given to each of the models to create their own blog based on the use case in puts and outputs:
“Can you write me a blog about these prompts being put into our ChatGPT integration on HubSpot, can you discuss the outputs and answers and talk pros and cons about what you gave me, what your thoughts on the answers are, how well you did at answering them and if you think it adds value to the HubSpot experience”
With DeepSeek restricting their API it could highlight and raise questions to whether or not what they did was something the western world should have worried so much about, all the hype for what reason? As a result of this we chose Grok to conduct the same tests as this too has been turning heads in the AI world. (as this was published Claude 3.7 had just been released)
We came up with 5 different use cases for these apps to carry out which we believed are most valuable to enhancing your HubSpot experience and data, we chose ‘Apple’ due it to being a very well known brand name and we believed it would be the easiest for the models to talk about and get information on for some of the prompts.
ChatGPT Pros and Cons
The first thought on ChatGPT’s responses to each prompt is that they’re very basic and straight to the point. It does exactly what you ask and not much more.
Some positives are that it's clear, easy to read, and simple to understand. The outputs are direct, so you don’t have to interpret or piece together an answer based on the way it responds. When prompted with words like “summarise,” it does exactly that with no extra waffle. It’s great at handling straightforward tasks that don’t require much reasoning, like providing the company code “APL” as well as researching company information since it’s widely available, and for well-known brands like Apple, it’s hard to get wrong.
On the downside, the answers can feel very general. It doesn’t necessarily give you the exact answer you’re looking for, just one based on what it has access to and has been trained on. Since it doesn’t have real-time internet access, the information is only accurate up to the last point it was updated. For example, saying “the iPhone 15 was recently launched” might have been relevant in September 2023, but in February 2025, it’s outdated and pretty useless if you’re looking for “relevant news” now, which meant it failed that prompt. Compared to Gemini, ChatGPT’s response to the “relevant news” prompt focused on completely different things, mostly minor details that weren’t really “relevant or big,” and kept it very brief, sentence for sentence.
Another issue with real-time access is location specific queries. When asked for “company names” to help a client in New Zealand, it provided a list of mostly US-based companies and none from NZ. The responses also tend to be brief and lack depth when a bit more might be required. I also found that translations were very generic and done word-for-word rather than considering the context of a sentence. This could be misleading, especially for longer or more complex translations where meaning matters just as much as the words themselves.
One of the major disadvantages I found with the ChatGPT responses on the record card property is that they would sometimes change. I have screenshots taken of what I saw at that point in time, but when I went away and came back an hour or so later, it would be a slightly altered version of it, which is concerning to see especially if you need reliable responses to help assist with various things within HubSpot.
Gemini Pros and Cons
Gemini’s answers were a lot more in depth and had a genuine structure. It gave proper reasoning as well as different options to choose from. Just like ChatGPT, using certain words like “summarise” helped it get to the point. Without that, it sometimes went on a tangent.
Its reasoning and explanation for translations let you decide which output was most appropriate, considering different phrasing styles and context. Having that choice highlights the possibility that ChatGPT’s response could be incorrect, which is a win for Gemini. Providing different options allows the user to pick the best fit, making it more accurate in those cases.
Gemini’s responses showed its ‘thinking and reasoning.’ For the client specific recommendations, it gave local companies that would be relevant to the data given in the prompt (“the client”). The companies it provided were from NZ, and it highlighted the reasons why ones from another country could still be beneficial to use, which would achieve the purpose of having a “client-specific recommendation,” unlike ChatGPT’s response. Gemini easily completed the 3CODE, giving an output of “APP” for the company Apple but maybe isn't the most adequate label.
When looking at the “relevant and big news” output, Gemini gave five different headings, all on different aspects of Apple, even though it was still prompted to summarise the answer. Not only does it provide a wide range of news, but it also gives a statement on Apple's current state and what the future could potentially involve. This is significantly more than what ChatGPT gave, showing the differences in models and Gemini going above and beyond to provide information.
Another positive of Gemini is that it would give a summary of the answer even if it wasn't prompted to, which is an advantage to the disadvantage where the answers are long and in-depth, sometimes unnecessarily long. However, the summary makes it easy to comprehend what has been outputted.
However, Gemini struggled to get to the point unless explicitly prompted. It sometimes included unnecessary details, like breaking down German words when that wasn’t asked for. This meant you had to sift through extra information instead of directly benefiting from AI’s efficiency.
Gemini also couldn't provide a direct answer to the company research prompt regarding the ‘number of iPhone models.’ Its response was “dozens and dozens” rather than a number. It listed all the model types up until 2023, which would be of no use when that was 18 months ago and several more models have been released since then. The answer it gave felt like it was pretending to be right, but it failed to achieve the prompt when asking for the number of models, which was surprising
Grok 2 Pros and Cons
Grok had a mix of both very basic as well as in-depth rich text style answers; for the most part, it did what it was asked. I felt its response to the app review translation was strong compared to the others. It kept it short and got straight to the point, even outputting the translation to make sense rather than a word-for-word translation.
The outputs were basic but had the necessary 'information' to achieve what it was asked, even though the prompts may have failed. It still demonstrated to me that it was giving accurate and true information based on what it was trained on, which is a positive in itself—just not what we are looking for if you need something answered in real time.
*Side note: the time it took to see the output appear in my custom property felt a lot faster than the other 2 models, which I didn’t suspect beforehand.
People had claimed Grok wasn’t strong in writing; however, I felt this was a positive when looking at the 'Important and relevant news' output. It gave 5 very clear summarised points filled with accurate information (just not up to date). The generic, historical information on Apple was accurate and gave me what I wanted to see, which was on par with the other models too.
Grok did what it was told for the most part; it summarised when asked and answered the question the best it could. The only reason it’s a fail is because fundamentally it wasn’t achieving the desired request. This could be detrimental if actually being used with real data that was going to influence decisions or in this case give client specific recommendations that could potentially be incorrect.
Grok really was similar to ChatGPT in most of its outputs; however, it was the only model to get the 3 Code wrong which was the easiest task given to the models. In fact, it gave a 4 code which was not what was asked of it, meaning it failed on that which was very surprising. As seen above, it gave 'APPL' which, if I was asking for a 4 code, then I would’ve been satisfied, but I didn’t.
It’s obvious to see that Grok doesn’t actually have real-time access as said. All the information, facts, and figures were 'up to the latest data available in 2023,' which we are now in 2025, meaning it failed the prompts asking for current information, it didn’t give the statistics on Apple or the correct number of iPhones. However, it did give a number, unlike Gemini, which that was asked of.
One of the biggest negatives of Groks outputs was the client-specific recommendations. It gave 3 companies in which none were from NZ, as well as very little justification to why these companies were selected. One of them was 'HubSpot,' which I felt was inappropriate to give because, as stated in the prompt, that’s where our client data is stored. It would make sense to think that this proposed strategy wouldn’t work as I’m asking for different companies that could help. I felt this output was the worst out of the 5 and, in my eyes, failed entirely.
I then went to each respective model and asked them all the same prompt:
"Can you write me a blog about these prompts being put into our integration on HubSpot? Can you discuss the outputs and answers, talk about the pros and cons of what you gave me, share your thoughts on the answers, how well you did at answering them, and if you think it adds value to the HubSpot experience? Here are the prompts we gave and the answers. First prompt:"... Followed by the prompts and the outputs it gave.
Here are the blogs each respective model produced:
ChatGPT
Gemini
Grok
Our favourite responses and blog post was Gemini as it proved superior due to its consistent accuracy, comprehensive detail, and practical application within the HubSpot context. It reliably delivered correct and current information, essential for informed business decisions. Beyond its lengthy answers, it provided valuable context and explanations, such as the nuanced distinctions within translation options and critically, its practical application was evident in the lead generation recommendations, offering specific, justified company suggestions which in a real life context would be highly valuable and extremely important to get right. It was the most consistent and was the closest at meeting the prompts request compared to the other two.
After inputting each 3 of the blogs into each of the models here's what the models thought where the best:
This was the prompt given
These are the blogs each model created on what they thought was the best output.
"1st Place: Blog #2
✅ Most precise and context-aware—It provided multiple translations, a nuanced take on the iPhone model count, and tailored company recommendations.
✅ Best for business users—The lead gen suggestions were specific and named actual companies, adding real value for HubSpot users.
✅ Good balance of depth and conciseness—Responses were well-structured and contextualized but didn't overwhelm.
⚠️ Minor downside—It referenced a potentially outdated product name ("iPhone 16"), though this is a small issue.
2nd Place: Blog #3
✅ Strong factual accuracy—It got the Apple details right and stayed within the constraints for the client code prompt.
✅ Structured and comprehensive—Responses were clear, business-friendly, and covered multiple angles.
✅ Good translation and news summaries—Captured meaning well, though it could refine phrasing.
⚠️ Weaker personalization—The lead generation response was solid but not localized to New Zealand.
3rd Place: Blog #1
✅ Conversational and user-friendly—It had a natural tone, making it easy to read.
✅ Good with general facts and news—Nailed the Apple history and company updates.
⚠️ Precision issues—Missed constraints in the client code task and was vague on the iPhone model count.
⚠️ Lacked deeper insights—The lead generation recommendations were useful but generic, not tailored or innovative.
Final Thoughts
Gemini (#2) wins for its precision, contextual intelligence, and business relevance.
ChatGPT (#3) follows closely, delivering well-structured answers but lacking localized insights.
Grok (#1) has potential but stumbles on finer details, making it the weakest for HubSpot users.
"Alright, after analyzing each blog, here's my ranking and reasoning:
Ranking:
Reasoning:
In summary, Blog 2 (Gemini) stood out for its accuracy, detail, and practical application, making it the most valuable resource for understanding and leveraging AI within HubSpot. Blog 3 (ChatGPT) was a close second, providing a thorough and informative evaluation. Blog 1 (Grok) was the weakest of the three, with some inaccuracies and less detailed responses."
"Alright, I’ve carefully reviewed the three blogs—each testing an AI (ChatGPT, Gemini, and Grok) integrated into HubSpot with identical prompts—and I’m ready to rank them based on accuracy, reliability, and usefulness for getting the most out of a HubSpot experience. My evaluation focuses on how well each AI handles the prompts, the practical value of the outputs for HubSpot users (e.g., marketers, salespeople, or service teams), and the overall consistency and depth of the analysis provided in the blogs. Here’s my ranking, from best to worst, with reasoning:
Unanimously each of the models said Gemini had the best response in terms of being the most accurate, reliable and would offer the most use to getting the most out of your HubSpot experience. Followed by ChatGPT being number 2 and then Grok being last.
Go and try this for yourself and have a play around with each of the apps and see the capability of them, go and see for yourself how they can improve your HubSpot experience.
If your not already signed up to MyGlue you can here this is where you can access all of our micro apps and keep track of your credit usage, change your billing info and view all of our support and knowledge base articles
We would love your thoughts and opinions on all of this, go and check out the 3 different blog posts (linked just above) that have been created by each of the different models and let us know if you agree or disagree with us on our thoughts.
If you have any use cases that you use these apps for feel free to reach out to us, were always curious to see how our customers are utilising our apps.