Changelog

DeepSeek is a Chinese AI research lab known for its open-source large language models (LLMs). Since releasing its first model in late 2023, DeepSeek has rapidly iterated through multiple versions (V1, V2, V2.5, V3) with significant improvements in architecture, performance, and features. This changelog provides a timeline of DeepSeek’s model releases – including key updates on release dates, model architecture changes, training scale, performance benchmarks, language support, and major new features – to help users and developers understand how the DeepSeek LLM evolved over time.

DeepSeek LLM (V1) – November 2023: Initial Open-Source Model

Release: The original DeepSeek LLM was released in November 2023 as the company’s first open-source language model. This initial version (often referred to as V1) established DeepSeek’s commitment to low-cost, open development of AI models.

Architecture & Training: DeepSeek V1 was a dense transformer model with 67 billion parameters, trained from scratch on a bilingual corpus of 2 trillion tokens (English and Chinese). A smaller 7B-parameter variant was also provided for the research community. The model featured base and chat versions, with the chat model fine-tuned for conversational use. It supported a 4K token context window for inputs/outputs.

Performance: Despite its relatively lower size, DeepSeek-67B V1 demonstrated strong capabilities for its time. It outperformed the contemporary Llama2-70B model on several benchmarks and excelled in coding and math tasks. For example, the 67B chat model achieved a HumanEval code score of 73.78 and showed superior math reasoning (84.1 GSM8K score). It also exhibited mastery in Chinese language understanding, reportedly surpassing OpenAI’s GPT-3.5 on Chinese tasks.

Open-Source & Usage: The V1 model weights were released under an open license (allowing commercial use), making DeepSeek-LLM one of the largest open-source chat models of 2023. Developers could access it via Hugging Face or DeepSeek’s API, and its launch laid the foundation for the fast-paced improvements in subsequent versions.

DeepSeek-V2 – May 2024: Mixture-of-Experts & Efficiency Boosts

Release: DeepSeek-V2 debuted in May 2024 as the second major version of the model. This update focused on dramatically improving performance and efficiency while keeping development cost low. It marked DeepSeek’s transition to advanced architectures pioneered in research.

Architecture: V2 introduced a Mixture-of-Experts (MoE) architecture for the model’s transformer blocks. The model scaled up to 236 billion total parameters, with ~21B parameters activated per token (sparsely). This design allowed V2 to allocate different “experts” to different tokens, improving computational efficiency. V2 also adopted Multi-Head Latent Attention (MLA), an innovation that compresses the key-value cache into latent vectors for faster inference. The context window was expanded to 128,000 tokens in V2, enabling much longer prompts and dialogues.

Training & Efficiency: DeepSeek-V2 was pretrained on 8.1 trillion tokens of high-quality data, a significant increase over V1. Despite the larger scale, the new architecture yielded substantial efficiency gains: training costs were reduced by ~42.5%, and the model’s inference throughput increased by 5.76× compared to the 67B V1 model. These optimizations meant V2 achieved higher performance at a fraction of the compute cost.

Performance: With only 21B active parameters at a time, DeepSeek-V2 matched top-tier open-source model performance in 2024. It delivered significantly stronger results than V1 across reasoning, coding, math, and knowledge benchmarks. The upgrade solidified DeepSeek’s reputation for “efficient, economical” LLM training. According to official reports, V2 achieved these gains while saving nearly half the training expense of the first model – an important milestone for open AI development.

Language & Features: Like V1, DeepSeek-V2 was bilingual (English and Chinese) and underwent Supervised Fine-Tuning and Reinforcement Learning steps to refine its conversational abilities. The model’s instruction-following improved markedly; internal evaluations showed prompt accuracy jumping from 63.9% to 77.6% after V2 fine-tuning. V2 also improved JSON-formatted output reliability (parsing success up from 78% to 97% with regex post-processing), reflecting better structured output capabilities.

Variant – DeepSeek-Coder-V2: In mid-2024, DeepSeek released a specialized coding-centric model based on V2. DeepSeek-Coder-V2 launched in July 2024 with the same MoE architecture (236B params, 128K context) and was tailored for programming tasks. It reached performance on par with GPT-4-Turbo in code generation and debugging, greatly enhancing DeepSeek’s coding abilities. This coder model still retained strong general reasoning, and its development fed back into the main line of models.

DeepSeek-V2.5 – September 2024: Unified Chat & Coding Model

Release: On September 5, 2024, DeepSeek introduced V2.5, a significant interim update merging its general-purpose chat and coder expertises into one model. DeepSeek-V2.5 was presented as “a powerful combination of DeepSeek-V2-0628 and DeepSeek-Coder-V2-0724” – essentially unifying the conversational prowess of the chat model with the robust code processing skills of the coder model. This all-in-one model provided a more streamlined user experience without needing separate endpoints.

Key Improvements: By blending the two V2 variants, DeepSeek-V2.5 retained the general conversational capabilities of DeepSeek-Chat and the strong coding/math abilities of DeepSeek-Coder. Moreover, it delivered better alignment with human preferences and instructions. The update brought “significant improvements in tasks such as writing and instruction-following”, reflecting more human-like and helpful responses. DeepSeek noted that V2.5 was tuned to align more closely with user intent while strengthening safety (more resistance to jailbreak prompts and fewer inappropriate outputs).

Performance: Internal benchmarks showed DeepSeek-V2.5 outperforming both the earlier V2-0628 chat model and V2-0724 coder model on most tests. Notably, V2.5 achieved higher win rates against rival models (like GPT-4o mini) in content creation and Q&A tasks. It also maintained excellent coding performance: V2.5 matched the coder model’s results on Python coding challenges (improving HumanEval and LiveCodeBench scores) and even improved few-shot plugin (FIM) completion success by ~5%. These gains made V2.5 one of the strongest open models of 2024 across a broad range of domains.

Backward Compatibility: For developers, DeepSeek-V2.5 was accessible via the same API endpoints as before – one could specify either deepseek-chat or deepseek-coder and receive responses from the unified V2.5 model. Features such as Function Calling, fill-in-the-middle completion (FIM), and JSON mode remained supported as in V2. This ensured an easy transition to the upgraded model without breaking existing integrations. Like its predecessors, V2.5 was fully open-sourced (model weights available on Hugging Face) to encourage community adoption.

DeepSeek-V3 – December 2024: Massive Scale & New Innovations

Release: DeepSeek-V3 was unveiled on December 26, 2024, marking the third-generation of DeepSeek’s LLM. This release was touted by the team as the “biggest leap forward yet” in the DeepSeek lineup. Alongside the model, DeepSeek provided an in-depth technical report and fully open-sourced the V3 weights, reinforcing their open AI mission.

Architecture & Scale: V3 dramatically expanded the Mixture-of-Experts design. The model boasts 671 billion total parameters, of which ~37B are active per token. This represents roughly a 3× increase in effective model capacity over V2. (In fact, a minor V3 update later bumped the total to 685B – see V3-0324 below.) Despite the enormous scale, DeepSeek-V3 retains the 128K context length support, allowing very long inputs and conversations. Under the hood, V3 builds on the innovations proven in V2 (such as the MLA mechanism and DeepSeek-MoE layers). Additionally, it pioneers a new training objective called multi-token prediction (MTP), enabling the model to predict multiple tokens in parallel during training. This was accompanied by an improved load balancing strategy for the MoE that required no auxiliary loss term – simplifying training while keeping experts utilized evenly. These architectural advances allowed V3 to scale up gracefully without instability (no loss spikes or training rollbacks were encountered despite the size).

Training Data: DeepSeek-V3 was pre-trained on an extremely large and diverse dataset of 14.8 trillion tokens, nearly doubling the token count from V2’s training. The team reported that the full V3 training consumed ~2.8 million GPU hours on H800 clusters, which is considered efficient for a model of this magnitude. Following pre-training, V3 underwent the usual supervised fine-tuning and reinforcement learning stages (including distillation of reasoning skills from DeepSeek’s R1 model) to maximize its capabilities.

Performance: DeepSeek-V3 delivered a new state-of-the-art among open models, reaching performance comparable to leading closed-source models like OpenAI’s latest offerings. On a broad array of benchmarks (knowledge quizzes, coding challenges, math problems, etc.), V3 not only surpassed all previous DeepSeek versions, but often matched or neared the scores of top proprietary models. Notably, V3’s massive scale particularly improved complex reasoning and multi-step problem solving. Yet, thanks to the efficient MoE design, inference speed actually increased – V3 can generate at ~60 tokens/second, roughly 3× faster than V2’s throughput. This speedup, combined with the huge context window, made V3 highly practical for applications like long-form chat, document analysis, and real-time assistant usage.

Language & Abilities: V3 continued DeepSeek’s focus on English and Chinese proficiency, further refining its Chinese writing and comprehension skills. The model produces more coherent, well-structured long-form responses in Chinese, benefiting from training insights borrowed from the R1 reasoning model’s style. Additionally, V3 improved at maintaining conversational context over very long dialogues (leveraging the 128K context). The generation quality, coding ability, and mathematical reasoning all saw major boosts – V3’s release notes highlight substantial gains on exams like MMLU, AIME and code benchmarks versus the previous generation. For example, V3’s math competition score jumped nearly 20 points (39.6 → 59.4) on the AIME test after adopting the new multi-token training objective. Such advances demonstrated that open models can close the gap with proprietary systems in both reasoning depth and answer accuracy.

Open Source & Ecosystem: As with prior versions, DeepSeek-V3 was released under an open-source license (MIT) with model weights and technical paper freely available. This openness, coupled with V3’s impressive capabilities, further galvanized the AI community – by early 2025, DeepSeek’s models were being integrated into various apps and even climbed app store charts, directly challenging offerings like OpenAI’s ChatGPT. The V3 launch also maintained API compatibility (developers could invoke it via model='deepseek-chat' without changing code), ensuring a smooth upgrade. DeepSeek hinted that V3 was “just the beginning”, with plans to explore multimodal AI (vision, etc.) and other cutting-edge features in future releases.

DeepSeek-V3-0324 – March 2025: Major Mid-Cycle Upgrade

Release: On March 24, 2025, DeepSeek rolled out DeepSeek-V3-0324, a mid-cycle update to the V3 model. Though described by DeepSeek as a “minor upgrade” with no API changes, this version proved to be a surprisingly significant improvement over the initial V3. It included further training refinements and optimizations that boosted the model’s performance across the board. (The model’s name “0324” reflects the release date; it was also made available as an updated checkpoint on Hugging Face.)

Improvements: DeepSeek-V3-0324 introduced a slightly larger expert ensemble (685B total parameters, up from 671B), giving the model even more capacity. The 128K context window was fully retained and possibly optimized for better utilization. Under-the-hood, this update brought enhanced reasoning, coding, and math skills – testers observed a “quantum leap” in logical reasoning and problem-solving abilities for what was supposed to be a minor patch. Benchmarks confirm the jump: V3-0324 outperformed the December V3 on challenging tasks, e.g. MMLU knowledge accuracy rose ~5.3 points, coding test pass rates by ~10 points, and math word problem accuracy by almost 20 points in one exam (AIME). These gains vaulted DeepSeek-V3-0324 into the elite tier of LLMs in early 2025, rivaling or exceeding other top models in coding and analytic tasks.

Refinements: Besides raw accuracy, the 03/2025 update addressed practical usability issues. It improved the model’s Chinese writing quality and style, aligning it more with the refined tone of the R1 reasoning series. It also made the chatbot better at multi-turn interactive rewriting and translation tasks. Additionally, the reliability of the model’s function calling API improved, resolving several errors from the initial V3 release. These changes, though not outwardly visible, were important for developers using DeepSeek in complex applications (e.g. tools that rely on the model to output structured data or execute function calls).

Takeaway: DeepSeek-V3-0324 demonstrated DeepSeek’s rapid iterative approach – even incremental updates can yield significant quality boosts. The model update was open-source and backward-compatible, so all users of V3 benefited immediately by switching to the new checkpoint. By this time (early 2025), DeepSeek had shown that an open-source project could keep pace with, and even surpass, much larger competitors through focused innovation and community collaboration.

Conclusion and Future Outlook

Over the course of its first 18 months, DeepSeek’s language model evolved from the 67B-param V1 to the massively scalable V3, achieving remarkable improvements in capability, speed, and efficiency. Each version brought architectural breakthroughs – from introducing MoE in V2 to pushing context length and multi-token generation in V3 – all while maintaining an open-source, research-friendly ethos. The timeline of DeepSeek updates (V1 → V2 → V2.5 → V3) reflects a deliberate strategy of combining cost-effective training with cutting-edge techniques to democratize advanced AI. General users today enjoy far better conversational quality, coding assistance, and multilingual support than was possible a year ago, thanks to these advancements. Developers and researchers, in turn, have gained access to state-of-the-art LLM models (and even distilled smaller versions) free of charge, empowering experimentation and real-world applications.

Looking forward, DeepSeek’s roadmap hints at even more ambitious steps – including multimodal capabilities (vision-language integration) and continued optimization of reasoning models. If the rapid progression of V1 through V3 is any indication, the coming releases may further narrow the gap between open-source and proprietary AI. The DeepSeek journey so far underscores how a committed open-source approach can drive fast-paced innovation in AI, to the benefit of the entire community.

Sources: The information above is compiled from official DeepSeek announcements, technical reports, and reputable analysis. Key references include DeepSeek’s GitHub/model cards, API documentation posts, and third-party coverage of each release, among others, as cited inline. This changelog will continue to be updated as new versions are released.