As an open-source AI powerhouse, DeepSeek invites comparison not just with proprietary models like GPT-4, but also with other open-source large language models.
In this article, we compare DeepSeek with notable open-source competitors, particularly Meta’s LLaMA 2 and the new Mistral models, along with mentions of others like Falcon.
We’ll explore how DeepSeek’s size and performance measure up, and consider factors like licensing, cost, and ease of use.
The goal is to see where DeepSeek stands in the broader open-source ecosystem and who might pose a threat to its position.
Why Compare DeepSeek to Other Open Models?
DeepSeek R1 gained attention by being the largest open-source LLM to date (671B parameters) and delivering performance on par with top closed models.
It broke new ground with an MIT license that allowed free commercial use, something many “open” models haven’t fully offered.
Naturally, this invites the question: How does DeepSeek fare against other open projects that came before or after it?
Meta’s LLaMA 2 (released mid-2023) was a significant open-ish model, with variants of 7B, 13B, and 70B parameters, touted as competitive with previous generations of GPT-3.5.
Mistral AI (a startup from Europe) entered the scene in late 2023 with Mistral 7B, a small model that surprisingly outperformed larger models, and they promised bigger models soon.
There are also models like Falcon (UAE’s 40B model), StableLM, and others. DeepSeek, being newer and larger, might outstrip many of these in raw performance – but factors like efficiency and accessibility are also crucial.
Let’s dive into specifics.
Model Size and Architecture
DeepSeek R1 – 671B (Mixture-of-Experts): DeepSeek’s flagship model is enormous in parameter count – 671 billion parameters – using a Mixture-of-Experts (MoE) architecture. However, it’s important to note that not all those parameters are active at once; DeepSeek R1 effectively activates ~37B parameters for any given query.
This MoE design allows it to be computationally efficient relative to its size, focusing different “experts” on different tasks. The benefit is that DeepSeek can achieve exceptional performance without always incurring the full cost of a 671B model.
The drawback is that it’s still a heavy model to run – it was trained on a cluster of many GPUs over months, and the full R1 model isn’t something you can typically run on a single machine.
DeepSeek did release distilled versions (down to 1.5B) to cover that gap, but the headline model requires server-class hardware or cloud resources (DeepSeek provides API access to it if you don’t have the hardware).
LLaMA 2 – up to 70B (Dense): LLaMA 2, from Meta AI, is a family of dense transformer models with 7B, 13B, and 70B parameter variants. The 70B version is the largest and most powerful openly available one (Meta did not open source their 175B model, so 70B is it). In terms of architecture, LLaMA 2 is a conventional LLM, not an MoE – all its parameters are used for computations.
At 70B, it’s much smaller than DeepSeek, roughly one-tenth the activated size of R1. This inherently limits LLaMA 2’s peak performance compared to something as large as DeepSeek, since parameter count often correlates with capability in LLMs.
However, size isn’t everything – LLaMA 2 was highly trained on a rich dataset and tuned (with chat instruction fine-tuning for the Chat versions), making it very efficient for its size.
Many in the AI community note that LLaMA2-70B approaches or surpasses GPT-3.5-level performance on a lot of tasks, though it still falls short of GPT-4. DeepSeek, by contrast, was aiming for GPT-4-level from the start.
Mistral Models: Mistral’s first public model, Mistral 7B (Sept 2023), has 7.3B parameters. Despite this small size, the Mistral team introduced architectural tweaks (like grouped-query attention and a 32k context window) and extensive training that allowed it to outperform models up to 13B in size.
Essentially, Mistral 7B was as good or better than LLaMA 2’s 13B model on many benchmarks, and even competitive with some 30B+ models in certain tasks. This was a remarkable achievement in efficiency. Mistral is reportedly working on larger models (“Mistral 13B”, and hints of 30B or more, possibly MoE mixtures like 8×7B experts, etc.).
By 2025, they have a “Mistral Large” model (possibly ~65B or a collection of experts) that they offer commercially via API.
That model uses advanced techniques to reach top-tier performance without matching DeepSeek’s brute-force parameter count. We’ll discuss performance in the next section.
Other notable open models: There’s Falcon (40B and 7B models released by UAE’s Technology Innovation Institute in 2023 under a permissive license).
Falcon-40B was among the earlier open models that challenged GPT-3 quality. There’s EleutherAI’s GPT-NeoX (20B) and StableLM (15B) etc., but these are older and generally lag behind LLaMA 2 and Mistral.
Meta is also presumably developing LLaMA 3 (some rumors suggest a 2024 release of a larger or improved model, possibly a 2× bigger one or including multi-modal abilities), but as of early 2025, LLaMA 2 is the latest widely available from them.
In terms of architecture innovations: DeepSeek’s big claim is that MoE allows it to be super large yet efficient.
Mistral’s innovation was focusing on training data quality and architectural tweaks at small scale.
LLaMA’s strength came from Meta’s vast pre-training on diverse data and their decision to open (ish) the model for research and commercial use (with some restrictions).
Licensing: This is crucial when comparing open models. DeepSeek R1 uses the MIT license, which is as permissive as it gets – anyone can use it, modify it, even integrate it into commercial products with no royalties.
LLaMA 2’s license, while called “open”, actually has some restrictions: for example, if your application has more than 700 million users, you’re required to get a special license from Meta.
Also, you cannot use LLaMA models to train other language models (which some argue disqualifies it from being true open source).
These clauses mean LLaMA 2 is not completely free for any use – it’s fine for most startups and companies, but technically big tech companies or those breaching certain user counts need permission. Mistral 7B, on the other hand, was released under the Apache 2.0 license, which, like MIT, imposes minimal restrictions (just requiring attribution).
That made Mistral 7B truly open in usage. In summary: DeepSeek and Mistral are fully open from a licensing standpoint, whereas LLaMA2 is mostly open but with some strings attached (and Falcon was also fully open under Apache 2.0).
This distinction matters for companies that want to avoid any legal gray areas in deployment.
Performance: DeepSeek vs LLaMA2 vs Mistral
Benchmark Scores: DeepSeek R1 has claimed state-of-the-art results on many benchmarks when it was released.
For example, it achieved 97.3% on MATH-500 (a collection of math problems), which was on par with or slightly better than OpenAI’s GPT-4 (referred to as “o1-1217” in their paper).
It also hit 79.8% on the AIME 2024 math competition, slightly edging out OpenAI’s model. On coding tests, DeepSeek R1’s Codeforces rating is around 2029, meaning it outperformed ~96.3% of human participants in those coding competitions.
These are extremely high scores that smaller models couldn’t match. LLaMA2-70B, by contrast, typically scores much lower on these hardest tasks (for instance, GPT-4 scored ~90% on MATH, while LLaMA2 70B was somewhere around 50-60% on similar math benchmarks according to various evaluations).
One source shows that on a broad set of benchmarks, LLaMA 2’s overall score was around 69.9%, which is far below what GPT-4 or DeepSeek achieved. In fact, an analysis by Elephas.app put Mistral’s larger model ahead of LLaMA: Mistral Large scored 81.2% overall, versus 69.9% for LLaMA-70B.
DeepSeek R1 wasn’t explicitly listed there, but given DeepSeek’s claims of GPT-4 parity, we can infer R1 would likely be in the mid-80s or above on that scale (close to GPT-4’s 86.4% in the same reference).
Let’s compare DeepSeek and LLaMA2 on some specific aspects:
- Multitask QA (MMLU benchmark): DeepSeek’s documentation shows R1 scoring 84.0% on MMLU-Pro. LLaMA2-70B’s MMLU (5-shot) was reported around 68-70% in Meta’s paper. This is a large gap – on knowledge and reasoning across subjects, DeepSeek outperforms LLaMA2 by a significant margin. Even Mistral’s big model (which might be ~30-65B params) reached 81.2%, still below DeepSeek’s likely score.
- Coding (HumanEval or similar): GPT-4 was ~67% on HumanEval, LLaMA2-70B around ~30%. DeepSeek’s coder version scored 82.6% on HumanEval according to some community claimsquora.com, which if accurate is way beyond LLaMA2. Mistral’s data: Mistral Large got 45.1% on HumanEval (noted as “Coding Proficiency: 45.1% on HumanE” presumably HumanEval) – again below DeepSeek. So for pure coding tasks, DeepSeek appears to dominate other open models (which is unsurprising given its scale and additional fine-tuning for coding).
- Reasoning and Logic: On benchmarks like ARC (Advanced Reasoning), Mistral Large scored ~94%, which is excellent. We don’t have a direct number for DeepSeek on ARC in the sources, but given DeepSeek’s performance on similar tasks (AIME, etc.), it likely is on par or higher. The fact that Mistral’s 65B-ish model is second only to GPT-4 generally means that DeepSeek R1 is either comparable to GPT-4 or slightly below it, but certainly above the rest. If GPT-4 is 86.4 and Mistral Large is 81.2, DeepSeek might be ~85-88 in that index (since DeepSeek claimed beating OpenAI’s model on some tests and matching on others).
In plain terms, DeepSeek currently outperforms all other open-source models in raw benchmarks.
Its closest open competitor in performance might be Mistral’s largest version (which is available via API, not fully open weights yet for the largest one) or perhaps Meta’s hypothetical next LLaMA.
But as of now, no publicly released open model has matched DeepSeek R1’s combination of coding, math, and reasoning prowess.
Even smaller distilled versions of DeepSeek are quite strong – for example, DeepSeek’s 70B distilled model (based on LLaMA2) scores 94.5% on MATH and 65.2% on code benchmarks, which is roughly on par with the full LLaMA2 70B on code, meaning the distilled 70B inherits a lot of R1’s skill.
However, performance isn’t the only consideration. It’s one thing to be the best on paper; it’s another for users to actually prefer or adopt a model. Let’s consider practical factors where smaller open models might compete:
- Hardware and Efficiency: DeepSeek R1 requires significant computing power to run due to its size. In contrast, LLaMA2’s 13B or 7B models can run on a single high-end GPU or even on a CPU (with some slowdown). Mistral 7B was explicitly released to be run on a single device – it’s only a ~13 GB download and can be loaded on a consumer-grade GPU. This means for many developers or hobbyists, LLaMA 2 and Mistral are far more accessible. Not everyone has access to an 8×A100 GPU server which DeepSeek might need. DeepSeek’s team did mention R1 could run on less advanced hardware (they reportedly trained it on NVIDIA H800 datacenter GPUs, which are cheaper than the latest H100s), but it’s still not at the scale of “run it on your laptop” for the full model. As a workaround, DeepSeek provides smaller distilled models (e.g. a 7B or 14B based on Qwen or LLaMA) that one can run locally. Yet, if you use those, you’re no longer getting the full R1 performance – you’re getting something closer to what a LLaMA2 or other 7B can do, albeit boosted by knowledge distilled from R1. So, in terms of practical deployment: Mistral 7B or LLaMA2 13B are lightweight and cost-efficient, whereas DeepSeek R1 is a heavyweight champion that likely needs cloud infrastructure or a powerful server.
- Cost of Inference: Running fewer parameters means cheaper and faster inference. Even though DeepSeek’s MoE activates 37B params per token, that’s still roughly half the compute of a 70B dense model for each token – meaning R1 can be surprisingly efficient for its size. The DeepSeek team claims their API cost is $0.55 per million input tokens and $2.19 per million output tokens. This is indeed cheaper than many competitors – for example, OpenAI’s GPT-4 32k context costs about $60 per million input tokens, so DeepSeek is orders of magnitude cheaper. It’s also noted to be cheaper than Gemini or Claude’s costs in many cases. Compared to Mistral’s offerings: Elephas showed “Claude 3.5 via Mistral partnership at $6/M, Claude Opus at $26/M, Mistral Large at ~$3/M, DeepSeek Pro at $1/M”. So DeepSeek’s API is extremely cost-competitive, even against other open model providers. This low cost stems from its efficient design and perhaps subsidization by the project. Meanwhile, if you run LLaMA2 or Mistral yourself locally, the cost is just your hardware and electricity – which for many small-scale uses is effectively negligible or a fixed cost. In a cloud scenario, running LLaMA2-70B might still cost more in compute time per query than hitting DeepSeek’s optimized API. It’s a bit counterintuitive: bigger model but cheaper to use, thanks to optimizations and presumably using lower-cost hardware at scale.
- Unique Strengths: LLaMA2 and Mistral might not beat DeepSeek in absolute performance, but they have niches. LLaMA 2 has been widely fine-tuned by the community for various specialized purposes (there are countless variants on HuggingFace for things like roleplaying chat, instruct tuning in other languages, etc.). This means if you need a model for a very specific domain, you might find a LLaMA2-based model already tailored for that. DeepSeek, being newer and larger, has fewer such fine-tuned variants publicly (though that could change as community adoption grows). Mistral demonstrated that smaller models can close the gap significantly: their 7B model matching 13B performance means they push efficiency. If Mistral releases, say, a 30B model, one might expect it to perform like a 60B+ regular model. This could start nibbling at the heels of DeepSeek’s distilled versions or even the full model on some tasks – especially if DeepSeek doesn’t scale down as efficiently. In other words, Mistral’s strategy could be a “size-efficient assault” on larger models: while DeepSeek went big, Mistral tries to get 90% of that performance with a fraction of the parameters. This is appealing for many use cases.
A concrete data point: Mistral’s team claims their Mistral Large (let’s assume this is around 65B dense or an MoE equivalent) is “among the top-performing AI systems” and in overall ranking stands second only to GPT-4, beating Claude 2, Gemini, GPT-3.5, and LLaMA2 in that comparison.
If DeepSeek R1 had been included, we can conjecture it would also be up there near GPT-4. So Mistral Large and DeepSeek R1 likely occupy a similar tier, with GPT-4 above them.
If Mistral eventually open-sources a Large model (they haven’t yet for the largest ones), that could become a direct open-source rival to DeepSeek’s supremacy.
Openness, Accessibility, and Community
Since we’re discussing open-source models, factors like community support, ease of use, and integration are key:
- DeepSeek being fully open and under MIT means researchers and companies can integrate it freely. It’s available on Hugging Face, and by early 2025 there were already 500+ derivative models created from R1’s weights (e.g., fine-tunes, compressions) with millions of downloads. This indicates a healthy and rapid adoption in the community. The downside is running R1 is non-trivial – many will use DeepSeek via the chat website or API rather than hosting it themselves. DeepSeek’s community is growing but it started essentially in late 2024, so it’s newer than LLaMA’s.
- LLaMA 2 has a huge community thanks to Meta’s release. However, its license complexity means truly integrating it into some products might cause caution (for most it’s fine, but it’s not Apache or MIT). Still, tools and libraries widely support LLaMA models; you can run them in 8-bit quantization on a GPU, or even 4-bit on some, making it the go-to for many personal projects. Also, LLaMA 2 doesn’t have an official API (Meta partnered with Microsoft Azure to offer it as a service, which is an option for enterprises). But open source implementations (like the transformers library) make it easy to deploy.
- Mistral 7B being Apache 2.0 and only 7B means it’s incredibly accessible – you can even run it on a laptop with enough RAM using CPU. It has a 32k context which is impressive for a small model, giving it a unique selling point (long context + small size). As of 2025, Mistral’s larger models (like “Mistral Large 2”) are not fully open weights (they provide API access commercially). If they stick to their word of open-sourcing some bigger models, we may see a freely available 30B or 65B Apache model, which would be a strong competitor in the open domain.
Training data transparency: Another aspect of openness: DeepSeek’s training details are somewhat less public than, say, Meta’s. Meta released a paper for LLaMA with info on data sources (though not full transparency due to using proprietary data).
Mistral presumably used a lot of the same open datasets. DeepSeek’s creators have not fully disclosed their exact dataset (and being in China, there’s some concern the data might have censorship biases, etc.).
This might matter for users who need to know what the model was trained on (e.g., any licensing issues in outputs).
With LLaMA2 and Mistral, since they are not fully open about data either (common in the industry), it’s a level field there.
Multilingual and Multimodal: DeepSeek is multilingual (supports English, Chinese, etc. as per reports), which is a plus.
LLaMA2 was mainly English-centric but can handle some other languages (Meta did some training on multiple languages). Mistral has a focus on multilingual as well (they mention English, French, German, etc. in their platform).
None of these open models (DeepSeek, LLaMA2, Mistral 7B) are inherently multimodal (they don’t take images or audio as input by default), unlike some of Google’s or OpenAI’s latest – that’s an area future open models might explore. There are projects adding vision to LLaMA (e.g., LLaVA) but natively, these are text-based. So in that sense, they’re comparable.
Key Differences at a Glance
To crystallize the comparison, here’s a quick rundown of DeepSeek vs LLaMA2 vs Mistral on major points:
- Performance: DeepSeek R1 is top-notch, roughly GPT-4 level on many tasks. LLaMA2 70B is more like GPT-3.5 level – very good, but a tier below. Mistral’s models aim to close this gap: Mistral’s largest (not fully public) is approaching that top tier (second only to GPT-4 globally), while their 7B punches above its weight but still can’t match DeepSeek on absolute performance. In coding/math benchmarks, DeepSeek has clear leads (10–30% higher success rates) over smaller models
- Model Size: DeepSeek: massive (671B total, MoE), requires significant resources. LLaMA2: up to 70B, easier to run but less powerful individually. Mistral: 7B (tiny) and potential ~65B class – focusing on doing more with less. For a given hardware budget, you might be able to run, say, eight instances of a 7B model (each handling different tasks/users) versus one instance of DeepSeek – depending on needs, that might be preferable (concurrency vs single-threaded might).
- License & Use: DeepSeek MIT (no strings attached); LLaMA2 custom (mostly free, just avoid the 700M user clause); Mistral 7B Apache 2.0 (completely free). So DeepSeek and Mistral are truly open. This means, for example, you can incorporate DeepSeek in a commercial product without worry – something many companies hesitate to do with LLaMA2 due to the license fine print.
- Context Window: DeepSeek R1 supports 64k-128k tokens (reports vary, but at least 64k confirmed in R1 and possibly extended to 128k in update). LLaMA2’s context is 4k (or up to 32k with fine-tuned versions, but standard LLaMA2 Chat is 4k). Mistral 7B has 32k context by default, which was a major selling point. So DeepSeek actually offers the largest context among open models, rivaling Claude and only behind Google’s experimental 1M context in Gemini. This is a huge advantage of DeepSeek in open-source: tasks needing long context (e.g., analyzing lengthy logs or multiple documents) could be handled by DeepSeek but not by a vanilla LLaMA2 or older models. Mistral’s 32k in a small model is nice, but DeepSeek still quadruples that.
- Hardware Requirements: DeepSeek R1 full run might need >400 GB of VRAM if loaded fully 16-bit (671B *2 bytes approx). With MoE optimizations and sharding, you could spread that across many GPUs. It’s reported that running R1 might require dozens of lower-end GPUs or a smaller number of high-memory GPUs; the exact requirements aren’t publicly detailed, but clearly it’s heavy. LLaMA2 70B can run on a single server with 4× A100 80GB GPUs (around 280GB VRAM total) or even on one A100-80GB in 4-bit quantization. Mistral 7B can run on a single 16GB GPU easily, even quantized to 8-bit it uses <10GB. So for local deployment, smaller models win big. DeepSeek Distill models are offered to bridge this: e.g., a DeepSeek-distilled LLaMA2-70B – which essentially gives you some of DeepSeek’s skills in a 70B package that has the same requirements as LLaMA2. That might be a sweet spot: you get better-than-plain-LLaMA performance (because it learned from R1) while still being able to run it on moderate hardware.
Strategic Implications: Is DeepSeek’s Lead Secure?
DeepSeek’s emergence certainly raised the bar for open models – having an MIT-licensed model that outperforms most closed models was unprecedented.
But the open-source AI landscape is very dynamic:
- Meta (LLaMA): Meta could respond with LLaMA 3 or further refined models. If they manage to produce, say, a 200B parameter model and partially open it (or even just release it to select researchers), that might narrow the gap. They have enormous compute resources and talent, so DeepSeek won’t be unchallenged. However, Meta’s openness is partly pragmatic (to gain community adoption); they might not push to the absolute cutting edge if they fear competitive misuse. Still, they did mention working on multimodal capabilities for future LLaMAs.
- Mistral and other startups: Mistral explicitly positions itself as an “open generative AI” leader, aiming for state-of-the-art performance with open models. They achieved a lot in 3 months for 7B. With $113M in funding and hires from Google DeepMind and Meta, they are a serious contender. If Mistral releases larger models under Apache 2 like their first, DeepSeek will have direct open rivals approaching its capability. The question is can a 30B or 60B dense model match a 37B active MoE in DeepSeek? Possibly not fully, but they might get close enough for many applications – and be far easier to use. So DeepSeek must keep improving (which they are, as evidenced by R1 updates and perhaps an R2 on the horizon).
- Other Open Efforts: The Falcon model from UAE had a strong initial release (Falcon-40B was top of some open model leaderboards in mid-2023), but it somewhat got overshadowed by LLaMA2 and Mistral. It’s still being developed (there was talk of Falcon 180B, but not sure if it’s out). If more nations or groups (StabilityAI, EleutherAI, etc.) build big open models, DeepSeek will be one of several, not the only game in town.
One more angle: Community & Ecosystem. DeepSeek being from a Chinese team means much of its community might be in China, while LLaMA/Mistral have strong Western open-source communities.
There might be forks or adaptations of DeepSeek that integrate it into Western frameworks more.
Already, we saw tools incorporating DeepSeek (for example, it’s available on HuggingFace and got integrated into Perplexity.ai’s search assistant).
Over time, if DeepSeek remains significantly stronger, the community will rally around it despite the hardware demands – possibly creating pruned or quantized versions.
However, open-source enthusiasts often value models they can run themselves easily.
A quote from a user on Hacker News captures it: “DeepSeek-R1 is better than Claude 3.5 and OpenAI’s model… but you need serious gear to use it”.
If a smaller open model is “good enough” for someone’s needs, they might opt for that instead of chasing the absolute state of the art.
DeepSeek might respond by releasing more efficient versions or the upcoming R2 which could incorporate efficiency improvements.
Conclusion: DeepSeek’s Place Among Open Models
DeepSeek R1 currently stands at the pinnacle of open-source model performance, outperforming Meta’s LLaMA 2 and showing up even newer rivals in benchmarks.
Its release was a game-changer – for the first time, a fully open model matched the capabilities of the best proprietary models like GPT-4.
This has positioned DeepSeek as a flagship of what open AI can achieve, prompting even discussions in media about how it might alter the AI landscape and global competition.
That said, the competition in open-source is heating up. LLaMA 2 set a strong foundation and still is the go-to for many because of its ease of use and community adoption.
Mistral has shown that agility and smart engineering can yield outsized results (a 7B model nearly beating 13B and competing with bigger ones). As Mistral and others release bigger models, DeepSeek will need to continue leveraging its two key advantages: massive scale and reasoning optimization.
DeepSeek’s team likely won’t stop at R1 – an R2 model could push performance even higher or improve efficiency.
If R2, for example, implements multimodal inputs or further closes any gaps in general capabilities, it would keep DeepSeek ahead of the pack.
From a user’s perspective, choosing between DeepSeek, LLaMA2, Mistral, etc., depends on needs:
- If you need the best accuracy and are willing to use substantial compute or a hosted API, DeepSeek is the choice – it’s essentially getting GPT-4-like power without the closed license, which is incredibly attractive for serious applications.
- If you need something you can tinker with on a personal PC or you prioritize efficiency, a smaller model like LLaMA-13B or Mistral-7B might suffice, acknowledging that you trade off some quality.
- For many, a middle ground could be using DeepSeek’s distilled 70B model, which gives better performance than vanilla LLaMA2-70B (thanks to knowledge distillation from R1) while remaining feasible to run locally.
In the open-source spirit, it’s not zero-sum: DeepSeek’s existence doesn’t diminish LLaMA or Mistral – instead, it raises the ceiling, inspiring others to reach that level.
Meta’s VP of AI might not publicly acknowledge it, but internally, projects like DeepSeek likely spur them to invest more in releasing powerful models.
And Mistral’s success in raising funds and attention was partly because the community was excited about open alternatives – a wave that DeepSeek is also riding.
To sum up, DeepSeek is currently the benchmark for open-model performance, while LLaMA 2 remains the popular workhorse and Mistral is the ambitious up-and-comer.
DeepSeek’s lead in benchmarks is secure for now, but the race is far from over. For the AI community and consumers, this competition is a big win: it means faster innovation, more options, and lower costs.
DeepSeek versus other open models is reminiscent of open-source software battles – with a mix of collaboration and healthy rivalry.
We can expect in 2025 and beyond that these projects will leapfrog each other. Today DeepSeek might be “king of the hill,” but tomorrow’s breakthrough (be it R2, LLaMA 3, or Mistral Large open release) could dethrone it.
For anyone invested in open AI, the key takeaway is that DeepSeek has proven open models can play in the big leagues, and it sets a target for others to aim at.
In choosing what to use, consider factors like performance needs, infrastructure, and license.
And keep an eye on new developments – as open models are evolving rapidly, with the gap between them and proprietary models narrowing with each iteration.