ARTS Check-in Day 4

I clocked in over twenty minutes late today. I was working overtime and chatting with my younger brother for a while, then I selected an article about using LLM for TDD development, but it was too long. To clock in on the same day, I switched to another article, but I still ended up being late, and the article I chose was of insufficient quality, which was a waste. Anyway, I’ll use this article to complete my clock-in, and the article on using LLM for TDD will be the content for my next clock-in.

A：263. Ugly Number #

An ugly number is a positive integer whose prime factors only include 2, 3, and 5.
Given an integer n, please determine if n is an ugly number. If so, return true; otherwise, return false.
Example 1:
Input: n = 6
Output: true
Explanation: 6 = 2 × 3
Example 2:
Input: n = 1
Output: true
Explanation: 1 has no prime factors, so its set of prime factors is the empty set {2, 3, 5}. It is conventionally considered the first ugly number.
Example 3:
Input: n = 14
Output: false
Explanation: 14 is not an ugly number because it includes another prime factor 7.

This is also a relatively simple problem. I didn’t think of any theoretical methods at first and initially overlooked that 0 is not an ugly number. After making some changes, I submitted it:

function isUgly(n: number): boolean {
  if (n === 0) {
    return false
  }
  if (n === 1) {
    return true
  }
  if (n % 2 === 0) {
    return isUgly(n / 2)
  }
  if (n % 3 === 0) {
    return isUgly(n / 3)
  }
  if (n % 5 === 0) {
    return isUgly(n / 5)
  }
  return false
}

The submission result was:

1013/1013 cases passed (56 ms)
Your runtime beats 100% of TypeScript submissions
Your memory usage beats 38.09% of TypeScript submissions (44 MB)

It’s worth noting that I implemented this using recursion, but it could also be done with a while (true) loop.

R：LangChain + Streamlit + Llama: Bringing Conversational AI to Your Local Machine #

Since everyone should be quite familiar with large language models and LangChain, I’ll just briefly describe it here.

Large language models have garnered significant attention, and many developers are using them to create chatbots, personal assistants, or content generation. The possibilities of large language models have sparked immense enthusiasm in the developer, AI, and NLP communities.

Domain-specific data can be injected into large language models to efficiently solve query problems, especially useful for internal company documentation knowledge bases. The architecture used to achieve this is called "retrieval-augmented generation" or "generative question answering."

What is LangChain? LangChain is a development framework that conveniently links the components of large language AI applications together, allowing developers to quickly implement applications like chatbots.

This article mainly discusses how to create a document assistant using LangChain and the LLaMA 7B model (I personally feel it's a bit outdated, as LLaMA2 is already available).

Article structure:

Create a virtual environment and file structure
Pull the large language model locally
Integrate the large language model into LangChain and customize the Prompt template
Document retrieval and answer generation
Create an application using Streamlit

1. Create a virtual environment and file structure#

Create a basic file structure and Python virtual environment, mainly for model files, Notebook files, and the app.py entry file. You can clone the author's repository: DocQA.

2. Pull the large language model locally#

LLaMA is a large language model released by Meta, and LLaMA2 can be used commercially for free. This article uses LLaMA1. Go to HuggingFace to find the LLaMA model and download the bin file to the models directory.

GGML is an open-source machine learning tensor library written in C++, which can run LLMs on consumer-grade hardware through quantization.

So what is quantization? The weights of LLMs are floating-point numbers, which take up more space and computing power compared to integer values. Quantization reduces the precision of weights to decrease resource usage. GGML supports 4-bit, 5-bit, and 8-bit quantization.

You need to weigh memory, disk space, and model performance to choose the model parameter size and quantization method. The larger the size and the higher the quantization precision, the better the performance, but it also consumes more resources.

If GGML is a C++ library, how can it be used in Python? This is where the llma-cpp-python project comes in. It is a Python binding for llama.cpp, allowing us to run the LLaMA model using Python.

After all this introduction, running it is actually very simple, just a few lines of Python code:

3. Integrate the large language model into LangChain and customize the Prompt template#

For LLMs, simplifying their operation can be understood as inputting text and outputting text. Therefore, most of the work in LangChain is also text-centric.

Subtle differences in prompts can lead to significant variations in the performance of LLMs, which is why the concept of Prompt Engineering was developed to consider how to generate higher-quality prompts. To facilitate seamless interaction with LLMs, LangChain provides functionality for developing prompt templates, which typically consist of two parts: text templates and dynamic parameters.

A simple application just needs to pass the prompt and input parameters to the LLM to generate results, but complex applications generally require connecting the LLM with other components. LangChain offers a development approach for chaining components together, allowing for serial calls to a series of components.

4. Document retrieval and answer generation#

In many LLM applications, the data users need is not in the model's training dataset and needs to be externalized in the prompt. LangChain provides the necessary components to load, transform, store, and query this data:

These five processes are: document loading - document transformation - embedding - vector storage - vector retrieval. Below is the complete process for document retrieval:

This process is quite lengthy, so I won’t elaborate here. It’s worth noting that because this is a locally deployed solution, the embedding model is not using a remote service, but rather the LlamaCppEmbeddings component from LangChain to perform embedding with the LLaMA model.

5. Create an application using Streamlit#

The author did not elaborate on Streamlit because it is a relatively optional step for the main workflow. However, when implementing file uploads using Streamlit, the author emphasized that to prevent memory issues, he saved the uploaded file in a temporary directory as raw.txt, currently only supporting txt file types, but it can be modified to support PDF and CSV files. Finally, by calling the Streamlit library, this LangChain-based LLM application was turned into a web application:

# Bring in deps
import streamlit as st 
from langchain.llms import LlamaCpp
from langchain.embeddings import LlamaCppEmbeddings
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

# Customize the layout
st.set_page_config(page_title="DOCAI", page_icon="🤖", layout="wide", )     
st.markdown(f"""
            <style>
            .stApp {{background-image: url("https://images.unsplash.com/photo-1509537257950-20f875b03669?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1469&q=80"); 
                     background-attachment: fixed;
                     background-size: cover}}
         </style>
         """, unsafe_allow_html=True)

# function for writing uploaded file in temp
def write_text_file(content, file_path):
    try:
        with open(file_path, 'w') as file:
            file.write(content)
        return True
    except Exception as e:
        print(f"Error occurred while writing the file: {e}")
        return False

# set prompt template
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Answer:"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

# initialize the LLM & Embeddings
llm = LlamaCpp(model_path="./models/llama-7b.ggmlv3.q4_0.bin")
embeddings = LlamaCppEmbeddings(model_path="models/llama-7b.ggmlv3.q4_0.bin")
llm_chain = LLMChain(llm=llm, prompt=prompt)

st.title("📄 Document Conversation 🤖")
uploaded_file = st.file_uploader("Upload an article", type="txt")

if uploaded_file is not None:
    content = uploaded_file.read().decode('utf-8')
    # st.write(content)
    file_path = "temp/file.txt"
    write_text_file(content, file_path)   
    
    loader = TextLoader(file_path)
    docs = loader.load()    
    text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
    texts = text_splitter.split_documents(docs)
    db = Chroma.from_documents(texts, embeddings)    
    st.success("File Loaded Successfully!!")
    
    # Query through LLM    
    question = st.text_input("Ask something from the file", placeholder="Find something similar to: ....this.... in the text?", disabled=not uploaded_file,)    
    if question:
        similar_doc = db.similarity_search(question, k=1)
        context = similar_doc[0].page_content
        query_llm = LLMChain(llm=llm, prompt=prompt)
        response = query_llm.run({"context": context, "question": question})        
        st.write(response)

I personally pay more attention to the application of Streamlit. I feel that after Gradio enables prototype development, more complex application development could likely be an opportunity for Streamlit. Unfortunately, this article did not delve deeply into it, and the content I most wanted to see was briefly glossed over.

T：CoDeF #

A video-to-video LLM that outputs stably and has good quality.

S：SQ3R Reading Method#

SQ3R stands for five words: Survey, Question, Read, Recite, Review. Before learning, first browse through the content, then based on this browsing, ask your own questions about what it is about and what problems it solves. Then, delve into reading with this question in mind. Find the answers through reading. Finally, close the book and recite what the book is about, what questions you had, and how the book addresses them. Lastly, review to consolidate learning outcomes. After these five steps, the content of the book can be truly absorbed.

Reference:

ARTS Clock-in Activity

A：263. Ugly Number#

R：LangChain + Streamlit + Llama: Bringing Conversational AI to Your Local Machine#