2

Wondering if there is any tool, preferably open-source, that generates a human-like summary of a corpus of articles/tweets/books?

Assuming one has lots of article abstracts, I would like to make a one-paragraph summary of all the main points.

1
  • Not come across an open source one. Check out something like autotldr.io There was also an app called Summly which Yahoo bought and then discontinued.
    – Z Z
    Jul 20, 2022 at 16:35

3 Answers 3

2

It is not open source, but OpenAI has a "2nd grader summary"(https://beta.openai.com/examples/default-summarize) and a "tldr-summary" API for this: https://beta.openai.com/examples/default-tldr-summary, and there is a free tier.

A few weeks ago I gave it the first few paragraphs of Immanuel Kant's "Critique of Pure Reason" (or the Wikipedia about it, I don't remember) to summarize. The GPT-3 model could not do this properly, but it responded with a human-like "Listen, pal, such texts are difficult to summarize even for a machine like me..."

2
  • Thank you @knb. Do you know if they allow the summarization of more than 1 article? Jul 20, 2022 at 20:51
  • THere is one "prompt": argument in the API call where your text is supposed to be put. So no I don't think so. I think it is best to submit 1 article per API-request. You can do so relatively easily in a loop say in a shell script or in a python script. -- People have started about "Prompt engineering", putting the text there in just the right order to get good results: twitter.com/amli_art/status/1549555691181854720 (in this example, for descriptions of AI-generated pixel grahics)
    – knb
    Jul 21, 2022 at 7:14
2

One may use the Python library GPT Index (MIT license but relies on GPT-3, which is closed-source + non-free) to summarize a collection of documents. From the documentation:

index = GPTTreeIndex(documents)
response = index.query("<summarization_query>", mode="summarize")

The “default” mode for a tree-based query is traversing from the top of the graph down to leaf nodes. For summarization purposes we will want to use mode="summarize".

 A summarization query could look like one of the following:

  • “What is a summary of this collection of text?”
  • “Give me a summary of person X’s experience with the company.”

FYI {1} is a great paper looking at GPT-3 performance for summarization, but they only looked at short texts, not collections of texts.


References:

0

One may use the Python library https://github.com/hwchase17/langchain (MIT license) to summarize a collection of documents.

From the documentation:

from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.