Human-like summarization of collection of texts

Question

Wondering if there is any tool, preferably open-source, that generates a human-like summary of a corpus of articles/tweets/books?

Assuming one has lots of article abstracts, I would like to make a one-paragraph summary of all the main points.

Not come across an open source one. Check out something like autotldr.io There was also an app called Summly which Yahoo bought and then discontinued. — Z Z, Jul 20, 2022 at 16:35

knb · Accepted Answer · 2022-07-20 08:41:36Z

2

It is not open source, but OpenAI has a "2nd grader summary"(https://beta.openai.com/examples/default-summarize) and a "tldr-summary" API for this: https://beta.openai.com/examples/default-tldr-summary, and there is a free tier.

A few weeks ago I gave it the first few paragraphs of Immanuel Kant's "Critique of Pure Reason" (or the Wikipedia about it, I don't remember) to summarize. The GPT-3 model could not do this properly, but it responded with a human-like "Listen, pal, such texts are difficult to summarize even for a machine like me..."

answered Jul 20, 2022 at 8:41

knb

2,8241 gold badge12 silver badges25 bronze badges

Thank you @knb. Do you know if they allow the summarization of more than 1 article?
– Gonçalo Peres
Jul 20, 2022 at 20:51
THere is one "prompt": argument in the API call where your text is supposed to be put. So no I don't think so. I think it is best to submit 1 article per API-request. You can do so relatively easily in a loop say in a shell script or in a python script. -- People have started about "Prompt engineering", putting the text there in just the right order to get good results: twitter.com/amli_art/status/1549555691181854720 (in this example, for descriptions of AI-generated pixel grahics)
– knb
Jul 21, 2022 at 7:14

Add a comment |

Franck Dernoncourt · Accepted Answer · 2023-01-17 04:58:12Z

One may use the Python library GPT Index (MIT license but relies on GPT-3, which is closed-source + non-free) to summarize a collection of documents. From the documentation:

index = GPTTreeIndex(documents)
response = index.query("<summarization_query>", mode="summarize")
The “default” mode for a tree-based query is traversing from the top of the graph down to leaf nodes. For summarization purposes we will want to use mode="summarize".

A summarization query could look like one of the following:

“What is a summary of this collection of text?”

“Give me a summary of person X’s experience with the company.”

FYI {1} is a great paper looking at GPT-3 performance for summarization, but they only looked at short texts, not collections of texts.

References:

{1} Goyal, Tanya, Junyi Jessy Li, and Greg Durrett. "News Summarization and Evaluation in the Era of GPT-3." arXiv preprint arXiv:2209.12356 (2022).

Franck Dernoncourt · Accepted Answer · 2023-02-12 04:17:54Z

0

One may use the Python library https://github.com/hwchase17/langchain (MIT license) to summarize a collection of documents.

From the documentation:

from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)

answered Feb 12, 2023 at 4:17

Franck Dernoncourt

37.3k32 gold badges122 silver badges243 bronze badges

Add a comment |

Stack Exchange Network

Human-like summarization of collection of texts

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged
open-source
language
nlp
or ask your own question.

Hot Network Questions

Human-like summarization of collection of texts

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged open-sourcelanguagenlp or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
open-source
language
nlp
or ask your own question.