[LLM] GPT4All: 로컬 PC에서 사용가능한 LLaMA

9 분 소요

Meta의 LLaMA의 변종들이 chatbot 연구에 활력을 불어넣고 있다. 이번에는 세계 최초의 정보 지도 제작 기업인 Nomic AI가 LLaMA-7B을 fine-tuning한GPT4All 모델을 공개하였다. Github에 공개되자마자 2주만 24.4k개의 star를 얻을만큼 큰 인기를 끌고 있다. (두 달전에 발표된 LLaMA의 github star는 16.9k였다.)

LLaMA 계열 Chatbot의 계보도

GPT4All가 인기를 얻는 비결은 로컬 PC 수준에서도 실행시킬 수 있는 장점때문이다. 이를 위해서 Nomic AI의 연구자들은 GPT4All에 4bit Quantization을 적용하였다. GPT4All의 학습 방법부터 사용방법까지 알아보기로 하자.

GPT4All의 학습 방법

데이터 수집

GPT-3.5-Turbo OpenAI API를 이용하여 2023/3/20 ~ 2023/3/26까지 100k개의 prompt-response 쌍을 생성하였다. Questions/prompts 쌍을 얻기 위해 3가지 공개 데이터셋을 활용하였다.

The unified chip2 subset of LAION OIG.
Coding questions with a random sub-sample of Stackoverflow Questions
Instruction-tuning with a sub-sample of Bigscience/P3

최종 prompt-generation 쌍을 만드는 방법은 다음과 같다.

1 단계: Nomic AI는 Atlas(Nomic AI가 만든 데이터 분석 툴)를 이용하여 GPT-3.5-Turbo로 생성한 초기 100k의 prompt-response 쌍에 data curation과 cleaning를 시행하여 806,199개의 고품질의 prompt-generation 쌍을 만들었다.

2 단계: 낮은 diversity 때문에 Bigscience/P3의 결과물을 제거하여 최종적으로 437,605 개의 prompt-generation 쌍을 만들었다.

GPT4All의 데이터셋 준비 과정

모델 학습

GPT4All은 학습에 사용된 방법은 다음과 같다

LLaMA 7B를 fine-tuning
LoRA 적용
437,605개의 prompt-generation pairs 사용
4 epochs
4일간의 학습

학습 비용 (총 $1400)

GPU Costs: $ 800. (Lambda Labs와 Paperspace에서 임대)
OpenAI API 사용료: $ 500
gpt4all-lora: $ 100 (Lambda Labs DGX A100 8x80GB)

GPT4All에 대한 평가

Self-Instruct 논문의 human evaluation data를 이용하여 GPT4All 모델과 공개적으로 가장 잘 알려진 alpaca-rola 모델의 perplexity를 비교하였을 때, GPT4All이 alpaca-lora 보다 통계적으로 더 낮은 ground truth perxities를 달성하였다.

source: https://s3.amazonaws.com/static.nomic.ai/gpt4all/2023_GPT4All_Technical_Report.pdf

사용방법

GPT4All 사용 주의사항

GPT4All은 비상업적인 라이센스를 갖는 LLaMA를 기반으로 만들어졌다.
GPT4All 모델 weights와 data는 연구 목적을 위해서만 사용할 수 있으며 상업적인 사용이 금지되어 있다.
GPT4All 모델 연구자들은 finetuning용 prompt-generation 쌍을 생성하기 위해 OpenAI의 GPT-3.5-Turbo를 사용하였다. OpenAI는 자신들의 제품을 이용하여 OpenAI와 상업적으로 경쟁하는 모델을 개발하는 것을 금지하고 있다.

설정하기

GPT4All의 설정은 굉장히 쉽다. GPT4All의 github의 안내를 따라서 GPT4All checkpoints(약 4GB)를 받은 후, OS에 맞는 실행 바이너리를 실행하면 된다.

필자는 MacBook M1 (16GB)에서 GPT4all 모델을 테스트하였다.

chat % ./gpt4all-lora-quantized-OSX-m1
main: seed = 1680942097
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'gpt4all-lora-quantized.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


== Running in chat mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMA.
 - If you want to submit another line, end your input in '\'.

Q&A 비교 (GPT4all vs ChatGPT)

동등한 조건에서의 성능 비교가 아니지만 GPT4all과 ChatGPT의 답변을 비교하였다.

#1. Google Transformer이 무엇인가?

[GPT4All] 설명이 구체적이지 못하고 내용이 이해할 수 없는 내용(e.g. Transformer를 시스템으로 표현)이 있다.

> What’s google transformer?

Google Transformers are neural network-based models developed by Google that aim at generating textual content based upon the inputted data, such as images or videos. These systems can be trained to generate a summary of an article or provide captions for YouTube video clips.

[ChatGPT]

> What’s google transformer?

Google Transformer refers to a deep learning architecture developed by Google researchers in 2017 for natural language processing (NLP) tasks. The architecture, known as the “Transformer,” is based on the concept of self-attention and is designed to process sequential data, such as text.

The Transformer architecture was a breakthrough in NLP, outperforming traditional recurrent neural network (RNN) architectures such as the long short-term memory (LSTM) network, which were previously the standard for NLP tasks. The Transformer uses self-attention to allow the model to focus on different parts of the input sequence, allowing it to better capture long-range dependencies and understand the context of each word in a sentence.

The Transformer has been used in a variety of NLP tasks, including machine translation, text classification, and question answering, and has been the basis for many state-of-the-art models in NLP, such as BERT and GPT.

#2. GPT은 무엇인가?

[GPT4All] 역시 구체적이지 못하고 GPT가 encoder와 decoder로 구성되어 있다라는 잘못된 정보를 알려준다.

> What’s the Generative Pretrained Transformer?

The generatively pretrained transformers are a type of neural network architecture that is designed to learn representations from unlabeled data. They consist of two parts, namely an encoder and decoder part which work in tandem with each other during training time. The input for the model can be text or images while it outputs either vectors (in case of language models) or 3D shapes (for image generation).

[ChatGPT]

> What’s the Generative Pretrained Transformer?

The Generative Pretrained Transformer (GPT) is a deep learning architecture developed by OpenAI in 2018 for natural language processing (NLP) tasks. GPT is based on the Transformer architecture developed by Google, but it was adapted to be used for language generation tasks such as text completion and story generation.

The GPT architecture is a type of unsupervised learning, which means that it is trained on large amounts of unstructured text data without any specific target output. During training, GPT uses a technique called masked language modeling, where some of the words in a text are randomly masked, and the model is trained to predict the masked words based on the context of the surrounding words.

GPT has been trained on massive amounts of text data, including books, articles, and websites, and has achieved state-of-the-art results on a range of NLP tasks, including language translation, text classification, and question answering. OpenAI has released several versions of GPT, with the latest being GPT-3, which is one of the largest and most powerful language models to date, with over 175 billion parameters.

# 3. GPT4Al에 대한 블로그 글 쓰기

[GPT4All] ChatGPT에 비해서 구체성이 많이 떨어진다.

> I want to write about GPT4All. Show me what I can write for my blog posts.

Sure! Here are some ideas you could use when writing your post on GPT4all model: 1) Explain the concept of generative adversarial networks and how they work in conjunction with language models like BERT. You may also want to discuss their limitations, such as bias or data privacy concerns. 2) Discuss some recent advancements made by researchers using GPT4all model for tasks beyond text generation (e.g., image captioning). 3) Explore how the model can be used in real-world applications like chatbots and customer service, as well as its potential impact on industries such as journalism or marketing. Overall, your post should aim to provide a comprehensive overview of GPT4All’s capabilities while also highlighting some practical use cases for the model moving forward in AI research.

[ChatGPT]

> I want to write about GPT4All. Show me what I can write for my blog posts.

GPT4All is an exciting new platform that brings the power of the Generative Pretrained Transformer (GPT) to a wider audience. Here are some ideas for blog posts about GPT4All:

Introducing GPT4All: The Next Generation of Language Generation — In this post, you can provide an overview of what GPT4All is, how it works, and why it’s important. You can talk about how GPT4All democratizes access to powerful language generation tools and how it can benefit businesses and individuals.

The Benefits of GPT4All for Content Creation — In this post, you can explore how GPT4All can be used to create high-quality content more efficiently. You can discuss how GPT4All can help content creators generate ideas, write drafts, and refine their writing, all while saving time and effort.

GPT4All vs. Other Language Generation Platforms — In this post, you can compare GPT4All to other language generation platforms on the market, highlighting the advantages of GPT4All’s technology, user interface, and pricing. You can also address any potential drawbacks of GPT4All and how the platform plans to address them.

Use Cases for GPT4All — In this post, you can showcase how GPT4All can be used in various industries and applications, such as e-commerce, social media, and customer service. You can also provide examples of how businesses and individuals have successfully used GPT4All to improve their workflows and outcomes.

How GPT4All is Revolutionizing Language Generation — In this post, you can delve into the technical details of how GPT4All’s architecture and training methods differ from other language generation models. You can also discuss how GPT4All’s innovations are pushing the boundaries of what is possible in natural language processing.

Overall, there are many angles you can take when writing about GPT4All, whether you focus on its technical aspects, its use cases, or its impact on the industry. Whatever approach you choose, GPT4All is a fascinating platform that is sure to captivate your readers.

#4. 미국의 대통령은 누구인가?

[GPT4All] 현 미국 대통령을 트럼프로 인식하고 있다. 그리고 맥락과 벗어난 엉뚱한 이야기를 하는 듯한 느낌이다.

> who’s the president of US?

The current President is Donald Trump, who assumed office on January 20th, 2017 after winning a landslide victory in November 8th election against Hillary Clinton. He was inaugurated as the nation’s chief executive and commander-in-chief of its armed forces at noon that day by Chief Justice John G. Roberts Jr., who administered his oath to him on board Air Force One, which had been parked in a remote area near Andrews Field outside Washington DC for security reasons after the inauguration ceremony was delayed due to inclement weather conditions and Trump’s motorcade being stuck in traffic jams caused by protesters.

[ChatGPT]

> who’s the president of US?

I’m sorry, as an AI language model, I am not able to provide real-time information. However, as of my knowledge cutoff date of September 2021, the President of the United States was Joe Biden. If there has been a change since then, I do not have that information.

총평

GPT4All은 소형 Chatbot 모델로 다음과 같은 특징을 갖는다.

LLaMA 7B 모델을 finetuning
Prompt-generation 쌍을 만들기 위해 GPT3.5-turbo 모델 사용
4bit Quantization 적용
LoRA 적용
저렴한 학습 비용 ($1400)

GPT4All은 4bit Quantization의 영향인지, LLaMA 7B 모델의 한계인지 모르겠지만, 대답의 구체성이 떨어지고 질문을 잘 이해하지 못하는 경향이 있었다. 그리고 한글 질문에 대해선 거의 쓸모 없는 대답을 내놓았다.

그래도 Macbook M1과 같은 사양에서 Chatbot을 사용할 수 있다는 점에서 의미가 크다라고 할 수 있다. 그리고 개인이 연구 목적으로 사용하는데 GPT4All는 아주 유용한 모델이 될 것으로 판단된다. Meta의 LLaMA 출시가 소형 Chatbot 모델을 가속화시키는 계기가 되었다. LLaMA의 변종이 얼마나 발전할지 기대를 해본다.

한편 Alpaca와 GPT4All과 같은 오픈소스 Chatbot은 ChatGPT와 같은 독점 foundation model로부터 추출한 prompt-generation 쌍을 모델 finetuning에 사용하고 있다. 이와 같은 방법은 연구 목적외 오픈소스 Chatbot을 상용화하는데 가장 큰 한계로 작용할 것으로 생각된다. 앞으로 이 문제를 극복하기 위해 새로운 아이디어가 필요할 것으로 보인다. (예를들어 Federated Learning을 사용하여 오픈소스 거대 foundation model을 만드는 것 등)

레퍼런스

[1] GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo

[2] GPT4all github

Twitter Facebook LinkedIn

LEE CHANWOO

[LLM] GPT4All: 로컬 PC에서 사용가능한 LLaMA

GPT4All의 학습 방법

데이터 수집

모델 학습

학습 비용 (총 $1400)

GPT4All에 대한 평가

사용방법

GPT4All 사용 주의사항

설정하기

Q&A 비교 (GPT4all vs ChatGPT)

#1. Google Transformer이 무엇인가?

#2. GPT은 무엇인가?

# 3. GPT4Al에 대한 블로그 글 쓰기

#4. 미국의 대통령은 누구인가?

총평

레퍼런스

공유하기

댓글남기기

참고

[Machine Learning] VAE : Encoder, Auto Encoder 그리고 Variational Auto Encoder

[논문 리뷰] Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators

[Network] 네트워크 터널링 : Cloudflare Tunnel로 로컬 환경을 안전하게 외부로 연결하기

[Machine Learning] SMOTE : 불균형 데이터 합성 샘플링 가이드