DeepSeek-R1モデル日本語版をGoogle Colab(無料)で実行

背景
環境構築
モデルのダウンロード
モデルの実行
まとめ

背景

DeepSeek-R1の日本語版モデルは、サイバーエージェントによってHugging Faceで公開されています。しかし、このモデルをGoogle Colabの無料プランで動かそうとすると、GPUのメモリが不足してしまいます。そのため、本記事では、GGUFフォーマットを利用して量子化モデルをColabで実行する方法を紹介します。

環境構築

まず、Ollamaをインストールします。Ollamaは、ローカル環境でLLMを簡単に実行できるツールです。

!curl https://ollama.ai/install.sh | sh

次に、必要なドライバと依存関係をインストールします。

!echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
!sudo apt-get update && sudo apt-get install -y cuda-drivers

OllamaのPythonライブラリをインストールします。

!pip install ollama

Ollamaのサーバーをバックグラウンドで起動します。

!nohup ollama serve &

モデルのダウンロード

DeepSeek-R1の日本語版モデルのGGUFをダウンロードします。
mmngaさんやluepen5805さんが公開してくださっています。

cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

bluepen5805/DeepSeek-R1-Distill-Qwen-14B-Japanese-gguf · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

!ollama pull hf.co/mmnga/cyberagent-DeepSeek-R1-Distill-Qwen-14B-Japanese-gguf

!ollama pull hf.co/mmnga/cyberagent-DeepSeek-R1-Distill-Qwen-32B-Japanese-gguf

!ollama pull hf.co/bluepen5805/DeepSeek-R1-Distill-Qwen-14B-Japanese-gguf

現在ダウンロードされているモデルを確認します。

!ollama list

モデルの実行

Pythonのollamaライブラリを使用してチャットを行います。

import ollama

response = ollama.chat(model='hf.co/bluepen5805/DeepSeek-R1-Distill-Qwen-14B-Japanese-gguf', messages=[
  {
    'role': 'user',
    'content': '自己紹介して',
  },
])
print(response['message']['content'])

まとめ

Google Colabの無料プランでは、DeepSeek-R1のオリジナルモデルを動かすにはメモリが足りません。しかし、GGUFフォーマットを利用することで、より軽量なモデルをColab上で実行することが可能になります。本記事では、Ollamaを使用して簡単に環境を構築し、DeepSeek-R1の日本語版を実行する方法を紹介しました。

今回紹介したコードはこちらから確認可能です。