Interactive Q&A Bot with RASA + Haystack + RoBERTa

8 min readMay 15, 2021

In this article, I plan to present the steps in creating an interactive bot for ‘Question and Answer’ model with K12 education knowledge base, using pre-trained Hugging Face transformer model ( RoBERTa), fine tuned with SQUAD 2.0 Q&A data set. The solution also makes use of Haystack framework for document retrieval and reader pipeline creation and Rasa for chat bot front-end framework to keep the interaction natural.

Architecture

1. Why BERT Transformer model ?

Transformers in general apply attention mechanisms to gather information about the relevant context of words, and then encode that context in a rich vector that smartly represents the word in parallel compared to LSTM or RNN.

BERT (Bidirectional Encoder Representations from Transformers) is a bi-directional transformer for pre-training over a lot of unlabeled textual data to learn a language representation that can be used to fine-tune for specific machine learning tasks.BERT models can process words in relation to all the other words in a sentence, rather than one-by-one in order.They can therefore consider the full context of a word by looking at the words that come before and after it — particularly useful for understanding the intent behind search queries.

In addition we can choose an already available pre-trained BERT models for transfer learning for faster deployment with fine tuning. A major advantage of pre-trained models is their ability to adapt to specific tasks by using relatively small amounts of labeled data, compared to training a model from scratch (see Figure 1).

Accuracy vs. number of training samples for a sentence-level sentiment-classification task. The baseline blue line represents training from scratch and the orange line represents fine-tuning a pre-trained BERT model. Source: “Using Transfer Learning for NLP with Small Data”

2. Advantages of using Haystack framework

Haystack is a framework that enables building powerful and production-ready pipelines for different search use cases including Closed Domain Question Answers (CDQA). State-of-the-Art NLP models including Hugging Face pre-trained transformer models in Haystack provide faster training and creating a unique search experience which can allow users to query in natural language.

Haystack employs Elasticsearch retriever that can search relevant documents from the entire document store and retrieve a set of candidate documents speeding up the querying process. Diverse models like BERT, RoBERTa, FARM trained models work on datasets like SQuAD can be configured as reader. These models can work on multiple passages of text as input and return top-n answers with corresponding confidence scores.Also the framework enables to create labels with different techniques: Annotate text with questions (+ answers) while reading passages in SQuAD style, Have a set of predefined questions and answers in the document (~ Natural Questions).

3. Advantages of using Rasa

For the Q&A-Bot framework, an interactive UI was needed and Rasa stands alone in many areas with their unique approach.While its easy to integrate and customize Rasa, the custom and pre-trained intents came in handy to understand the user’s request.

RASA Actionserver enables you to call an endpoint and you can specify when a custom action is predicted. This endpoint runs the code and returns the information for the requested action and maintains dialogue state. RASA also provides timeout flexibility without breaking the conversation flow while response are generated by the model.

Solution steps

1. Data Collection

The subject content was collected from the web-link as pdf files.

import urllib.requesturl_path = "SUBJECT_CONTENT_LINK"def download_file(download_url, filename):
    response = urllib.request.urlopen(download_url)    
    file = open(filename + ".pdf", 'wb')
    file.write(response.read())
    file.close()download_file(url_path, "Textbooks")

2. Data Wrangling

A right pre-processing steps on the text documents can have a great impact on the accuracy and the speed of the model. The following pre-processing steps were followed on all the data files — Extract text from the files, Normalize white spaces, Clear header_footer , Split text from files into smaller sentences, Remove empty lines, Text normalization.

Preprocessing as directed in Haystack :converter = PDFToTextConverter(remove_numeric_tables=True, valid_languages=["en"])
doc_pdf = converter.convert(file_path="Textbooks.pdf", meta=None)# Haystack also has a convenience function that will automatically apply the right converter to each file in a directory.

all_docs = convert_files_to_dicts(dir_path="data/Textbook_data")

3. Pre-trained model selection

The pretrained models listed below were chosen as model candidates.

roberta-base-squad2
12-layer, 768-hidden, 12-heads, 125M parameters
RoBERTa using the BERT-base architecture
distilbert-base-uncased
6-layer, 768-hidden, 12-heads, 66M parameters
The DistilBERT model distilled from the BERT model bert-base-uncased checkpoint
miniLM
12-layer, 384-hidden, 12-heads, 21M parameters, 96M embedding parameters
MiniLM using the BERT-base architecture

4. Preparing for Fine tuning model : Annotation

Haystack Annotation tool was extensively used and as many as 1000+ closed domain questions and answers were annotated for training and evaluation of the model.The output from the annotation tool is a SQuAD 2.0 json file which can be used to fine tune the model.

Haystack Annotation tool to generate question and answer pairs

The json file ‘annotated_train_answers.json’ containing the question, answer pairs was used to fine tune the model.

5. Fine-tuning

All the three pre-trained models were fine-tuned with annotated question-answer data (Step 4) on the below hardware configuration and performances were compared.

Fine tuning the model as directed in Haystack:train_data = "data/Textbook_data"
device, n_gpu = initialize_device_settings(use_cuda=True)# Initialize the reader model
reader = FARMReader(model_name_or_path="model", use_gpu=True)reader.train(
   data_dir=train_data,
   train_filename="annotated_train_answers.json"",
   n_epochs=4,
   dev_split = 0.3,
   save_dir="model")reader_train_eval_results = reader.eval_on_file(data_dir=train_data, test_filename="annotated_train_answers.json",device=device)print("Accuracy:" ,reader_train_eval_results["top_n_accuracy"])
print("F1-Score:", reader_train_eval_results["f1"])

RoBERTa model whose performance was superior compared to the others was selected.

6. Creating Elastic-search image for Q&A document base

Next step was to create an Elasticsearch docker image for the closed-domain documents gathered.

status = subprocess.run(['docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.9.2'], shell=True)
time.sleep(15)### Connect to Elasticsearchdocument_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="class")
files_dir = "Textbook_data/*.pdf"
dicts     = convert_files_to_dicts(dir_path=files_dir, clean_func  =clean_wiki_text,split_paragraphs=True)
document_store.write_documents(dicts)

Ensure a new docker container is created after the above code is executed using the command $docker ps -a. Can commit this docker image with a tag (version/index name) and for future uses.

7. Making the model up : FAST API and Docker file

A FAST API method was implemented to handle action from Rasa(accept HTTP POST request). The response to the query, is then sent back to the Rasa action as HTTP RESPONSE.

@router.post("/new_query")
def new_query(request: IntentRequest):
  queryText = request.queryText
  model_id :int = MODEL_ID
  k_reader: int = DEFAULT_TOP_K_READER
  k_retriever: int = DEFAULT_TOP_K_RETRIEVER
  with doc_qa_limiter.run():
     start_time = time.time()  finder = FINDERS.get(model_id, None)
  data = {'questions':[question],'filters':None, 
          'top_k_reader':k_reader,'top_k_retriever':k_retriever}  myQuestion = Question(**data)
  answers = search_documents(finder, myQuestion, start_time)
  answerText="No answer available.."  if answers and len(answers) > 0:
     answerText = AnswerTextFromResults(dict(answers[0]))
  return {'fulfillmentText': answerText}

The Haystack dockerfile was then updated with the new Rest API. In the Haystack dockerfile.yml update the model to fine-tuned RoBERTa model, Elasticsearch image to the new image with profiled Q&A document-store and Streamlit UI and now the framework is ready.

version: "3"
services:
haystack-api:
   build:
   context: .
dockerfile: Dockerfile
ports:
- 8000:8000volumes:
   # Folder for mounting fine-tuned mode
   - "./models:/home/user/models"
environment:
- DB_HOST=elasticsearch
- USE_GPU=True
- TOP_K_PER_SAMPLE=3 # how many answers can come from the same small passage (reduce value for more variety of answers)restart: always
depends_on:
- elasticsearch
command: "/bin/bash -c 'sleep 50 && gunicorn rest_api.application:app -b 0.0.0.0 -k uvicorn.workers.Uvicorn Worker --workers 1 --timeout 180 --preload'"
elasticsearch:
# Load the new elastic search image with closed domain question-answer documents
image: "q-and-a:version1"
ports:
- 9200:9200environment:
- discovery.type=single-nodeui:
# For Testing the API
image: "deepset/haystack-streamlit-ui"
ports:
- 8501:8501
environment:
- API_ENDPOINT=http://haystack-api:8000

8. Designing and Integrating RASA chat-bot framework

With model done with FAST API hosting, now a Rasa project created. rules.yml file edited to add an intent so that the Rasa action server is delegated to handle domain specific Q&A .

- rule: Ask the user to rephrase whenever they send a message with low NLU confidence
  steps:
    - intent: nlu_fallback
    - action: action_intent_question

Edit the actions.py as below to invoke the new FAST API created in Haystack and handle back the response from the FAST API ( answers to domain specific questions)

class ActionHelloWorld(Action):
def name(self) -> Text:
return "action_intent_question"def run(self, dispatcher: CollectingDispatcher,
  tracker: Tracker,
  domain: Dict[Text, Any]) -> List[Dict[Text, Any]]:message = tracker.latest_message.get('text')
url = "http://localhost:8000/XXX_haystack-api"data = {'queryText': message}
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
response = requests.post(url, data=json.dumps(data), headers=headers)
data = response.json()
dispatcher.utter_message(text=data['fulfillmentText'])
return []

9. Fine tuning Rasa NLU model

Rasa core NLU model was fine-tuned with following hyper parameters defined in config.yml

Configuration for Rasa NLU.
language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.9

10. Final Deployment with Rasa X front end

Run the docker-compose.yml file in Haystack which will bring the model up and running along with Elastic-search based retriever . In the Rasa actions folder invoke $ rasa run actions to start RasaActionServer. Finally invoke ‘rasa x’ and choose the Rasa fine tuned NLU model to bring up the chat-bot UI interaction. Rasa was now able to handle generic greeting messages as well as domain specific Q&A seamlessly as shown below.

Conclusion

This post demonstrates that with a pre-trained BERT model and Haystack framework one can create a high quality and deployable model for Question-Answering extremely fast and accurately.

Further integrating Rasa core and Actionserver , we can create scalable and flexible chat-bot dialog management system which can be further fine-tuned as well as integrated with many front end applications.

Please feel free to post back your queries or suggestions.

Reference

[1] Haystack Deepset ai https://haystack.deepset.ai

[2]Rasa https://rasa.com

[3] Creating Rasa chatbot using action server https://towardsdatascience.com/create-chatbot-using-rasa-part-1-67f68e89ddad

[4] Deepset AI github https://github.com/deepset-ai/haystack