Wondering if there is any tool, preferably open-source, that generates a human-like summary of a corpus of articles/tweets/books?
Assuming one has lots of article abstracts, I would like to make a one-paragraph summary of all the main points.
Software Recommendations Stack Exchange is a question and answer site for people seeking specific software recommendations. It only takes a minute to sign up.
Sign up to join this communityWondering if there is any tool, preferably open-source, that generates a human-like summary of a corpus of articles/tweets/books?
Assuming one has lots of article abstracts, I would like to make a one-paragraph summary of all the main points.
It is not open source, but OpenAI has a "2nd grader summary"(https://beta.openai.com/examples/default-summarize) and a "tldr-summary" API for this: https://beta.openai.com/examples/default-tldr-summary, and there is a free tier.
A few weeks ago I gave it the first few paragraphs of Immanuel Kant's "Critique of Pure Reason" (or the Wikipedia about it, I don't remember) to summarize. The GPT-3 model could not do this properly, but it responded with a human-like "Listen, pal, such texts are difficult to summarize even for a machine like me..."
"prompt":
argument in the API call where your text is supposed to be put. So no I don't think so. I think it is best to submit 1 article per API-request. You can do so relatively easily in a loop say in a shell script or in a python script. -- People have started about "Prompt engineering", putting the text there in just the right order to get good results: twitter.com/amli_art/status/1549555691181854720 (in this example, for descriptions of AI-generated pixel grahics)
One may use the Python library GPT Index (MIT license but relies on GPT-3, which is closed-source + non-free) to summarize a collection of documents. From the documentation:
index = GPTTreeIndex(documents) response = index.query("<summarization_query>", mode="summarize")
The “default” mode for a tree-based query is traversing from the top of the graph down to leaf nodes. For summarization purposes we will want to use
mode="summarize"
.A summarization query could look like one of the following:
- “What is a summary of this collection of text?”
- “Give me a summary of person X’s experience with the company.”
FYI {1} is a great paper looking at GPT-3 performance for summarization, but they only looked at short texts, not collections of texts.
References:
One may use the Python library https://github.com/hwchase17/langchain (MIT license) to summarize a collection of documents.
From the documentation:
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)