James Brigg is a freelance ML (machine learning) engineer, startup advisor, and dev advocate @ Pinecone.
There are many instances where ChatGPT has not learned unpopular subjects.
There are two options for allowing our LLM (Large Language Model) to better understand the topic and, more precisely, answer the question.
1. We fine-tune the LLM on text data covering the domain of fine-tuning sentence transformers.
2. We use retrieval-augmented generation, meaning we add an information retrieval component to our GQA (Generative Question-Answering) process. Adding a retrieval step allows us to retrieve relevant information and feed this into the LLM as a secondary source of information.
We can get human-like interaction with machines for information retrieval (IR) aka search. We get the top twenty pages from google or Bing and then we have the Chat system scan and summarize those sources.
There are also useful public data sources. The dataset James uses in his example is the jamescalam/youtube-transcriptions dataset hosted on Hugging Face Datasets. It contains transcribed audio from several ML and tech YouTube channels.
James massages the data. He uses Pinecone as his vector database.
OpenAI Pinecone
The OpenAI Pinecone (OP) stack is an increasingly popular choice for building high-performance AI apps, including retrieval-augmented GQA.
The pipeline during query time consists of the following:
* OpenAI Embedding endpoint to create vector representations of each query.
* Pinecone vector database to search for relevant passages from the database of previously indexed contexts.
* OpenAI Completion endpoint to generate a natural language answer considering the retrieved contexts.
LLMs alone work incredibly well but struggle with more niche or specific questions. This often leads to hallucinations that are rarely obvious and likely to go undetected by system users.
By adding a “long-term memory” component to the GQA system, we benefit from an external knowledge base to improve system factuality and user trust in generated outputs.
Naturally, there is vast potential for this type of technology. Despite being a new technology, we are already seeing its use in YouChat, several podcast search apps, and rumors of its upcoming use as a challenger to Google itself
Generative AI is what many expect to be the next big technology boom, and being what it is — AI — could have far-reaching implications far beyond what we’d expect.
One of the most thought-provoking use cases of generative AI belongs to Generative Question-Answering (GQA).
Now, the most straightforward GQA system requires nothing more than a user text query and a large language model (LLM).
We can test this out with OpenAI’s GPT-3, Cohere, or open-source Hugging Face models.
However, sometimes LLMs need help. For this, we can use retrieval augmentation. When applied to LLMs can be thought of as a form of “long-term memory” for LLMs.
Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.