Machine Learning/DL - NLP

KoAlpaca 랭체인(langchain) 활용하기

IP_DataScientist 2023. 6. 13.

모델 정보

기본 방식

https://github.com/Beomi/KoAlpaca

GitHub - Beomi/KoAlpaca: KoAlpaca: 명령어를 이해하는 한국어 언어모델

KoAlpaca: 명령어를 이해하는 한국어 언어모델. Contribute to Beomi/KoAlpaca development by creating an account on GitHub.

github.com

import torch
from transformers import pipeline, AutoModelForCausalLM

MODEL = 'beomi/KoAlpaca-Polyglot-5.8B'

model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
).to(device=f"cuda", non_blocking=True)
model.eval()

pipe = pipeline(
    'text-generation', 
    model=model,
    tokenizer=MODEL,
    device=0
)

def ask(x, context='', is_input_full=False):
    ans = pipe(
        f"### 질문: {x}\n\n### 맥락: {context}\n\n### 답변:" if context else f"### 질문: {x}\n\n### 답변:", 
        do_sample=True, 
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.9,
        return_full_text=False,
        eos_token_id=2,
    )
    print(ans[0]['generated_text'])

ask("딥러닝이 뭐야?")

결과:

Langchian 으로 불러오는 방법

import torch
from langchain import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(model_id="beomi/KoAlpaca-Polyglot-5.8B", task="text-generation", model_kwargs={"temperature":0, "max_length":2048, "torch_dtype":torch.float16}, device=1)

최대 토큰길이 : 2048 (모델 config 참고)
torch_dtype : torch.float16
(모델 학습도 float16으로 지정되어 있지만 한번 더 지정해야하는 이유는 보통 학습을 float32로 하기때문에 지정하지 않으면 메모리 뻥튀기 해서 불러오려는 경향이 있다고함<GPU memory이슈>)
device = 1 (1번 GPU 사용)

from langchain import PromptTemplate,  LLMChain

template = """Question: {question}

Answer: 단계별로 생각해 보자"""
prompt = PromptTemplate(template=template, input_variables=["question"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "뇌파 검사가 뭐야?"

print(llm_chain.run(question))

결과:

참고

https://keepdev.tistory.com/8

Tensorflow - float32 쓰는 이유

import tensorflow as tf X = tf.placeholder(tf.float32, [None, 28*28]) Tensorflow를 공부하다 보면 위와 같이 float32를 쓴다. 왜 굳이 float32를 쓰는가에 대해 알아보려고 한다. * 코드설명 - tf.placeholder: 재료를 담는 그

keepdev.tistory.com

'Machine Learning > DL - NLP' 카테고리의 다른 글

로컬 LLM 정리 (feat. KoAlpaca) (0)	2023.06.28
구글 PaLM 2 정리 (0)	2023.06.27
Chroma(Vector DB) and Sentence Transformer (0)	2023.05.30
LLaMA 모델의 간략한 역사 (0)	2023.05.23
NLP모델 파라미터 수 알아보기(feat. number of parameters of DNN models) (0)	2023.01.12

KoAlpaca 랭체인(langchain) 활용하기

모델 정보

기본 방식

Langchian 으로 불러오는 방법

참고

'Machine Learning > DL - NLP' 카테고리의 다른 글

댓글

💲 Google Ads.

티스토리툴바