import html2textfrom IPython.display import Markdown# content = soup.select("#mw-content-text > div.mw-content-ltr.mw-parser-output")[0]content = soup.select_one("#mw-content-text > div.mw-content-ltr.mw-parser-output")paragraphs = content.find_all('p')intro_paras = []for p in paragraphs:# Check if we've hit an h2 heading's divif p.find_previous('div', class_='mw-heading mw-heading2'):breakif p.text.strip(): # Only keep non-empty paragraphs intro_paras.append(p)for p in intro_paras:if p.text.strip():print(p.text.strip())print("\n---\n")
Venom in snakes and some lizards is a form of saliva that has been modified into venom over its evolutionary history.[1] In snakes, venom has evolved to kill or subdue prey, as well as to perform other diet-related functions.[2] While snakes occasionally use their venom in self defense, this is not believed to have had a strong effect on venom evolution.[3] The evolution of venom is thought to be responsible for the enormous expansion of snakes across the globe.[4][5][6]
---
The evolutionary history of snake venom is a matter of debate. Historically, snake venom was believed to have evolved once, at the base of the Caenophidia, or derived snakes. Molecular studies published beginning in 2006 suggested that venom originated just once among a putative clade of reptiles, called Toxicofera, approximately 170 million years ago.[7] Under this hypothesis, the original toxicoferan venom was a very simple set of proteins that were assembled in a pair of glands. Subsequently, this set of proteins diversified in the various lineages of toxicoferans, including Serpentes, Anguimorpha, and Iguania: several snake lineages also lost the ability to produce venom.[8][9] The Toxicoferan hypothesis was challenged by studies in the mid-2010s, including a 2015 study which found that venom proteins had homologs in many other tissues in the Burmese python.[10][11] The study therefore suggested that venom had evolved independently in different reptile lineages, including once in the Caenophid snakes.[10] Venom containing most extant toxin families is believed to have been present in the last common ancestor of the Caenophidia: these toxins subsequently underwent tremendous diversification, accompanied by changes in the morphology of venom glands and delivery systems.[12]
---
Snake venom evolution is thought to be driven by an evolutionary arms race between venom proteins and prey physiology.[13] The common mechanism of evolution is thought to be gene duplication followed by natural selection for adaptive traits.[14] The adaptations produced by this process include venom more toxic to specific prey in several lineages,[15][16][17] proteins that pre-digest prey,[18] and a method to track down prey after a bite.[19] These various adaptations of venom have also led to considerable debate about the definition of venom and venomous snakes.[20] Changes in the diet of a lineage have been linked to atrophication of the venom.[8][9]
---
AI Prompt
Can you wrap this in a function that takes a url and returns intro paragraphs thatās already stripped?
AI Response
Iāll help you create a function that extracts and cleans the introduction paragraphs from a Wikipedia article. Hereās the code:
def get_wiki_intro(url):""" Get the introduction paragraphs from a Wikipedia article. Returns a list of cleaned paragraph texts. """ response = httpx.get(url) soup = BeautifulSoup(response.text, 'lxml')# Get the main content content = soup.select_one("#mw-content-text > div.mw-content-ltr.mw-parser-output") paragraphs = content.find_all('p')# Collect intro paragraphs (stop at first heading) intro_texts = []for p in paragraphs:# Break if we hit a headingif p.find_previous('div', class_='mw-heading mw-heading2'):break# Only keep non-empty paragraphs text = p.text.strip()if text: intro_texts.append(text)return intro_texts# Test iturl ="https://en.wikipedia.org/wiki/Evolution_of_snake_venom"intros = get_wiki_intro(url)for p in intros:print(p)print("\n---\n")
Would you like to test this function with a different Wikipedia article to make sure it works as expected?
def get_wiki_intro(url):""" Get the introduction paragraphs from a Wikipedia article. Returns a list of cleaned paragraph texts. """ response = httpx.get(url) soup = BeautifulSoup(response.text, 'lxml') content = soup.select_one("#mw-content-text > div.mw-content-ltr.mw-parser-output") paragraphs = content.find_all('p') intro_texts = []for p in paragraphs:if p.find_previous('div', class_='mw-heading mw-heading2'):break text = p.text.strip()if text: intro_texts.append(text)return intro_texts
url ="https://en.wikipedia.org/wiki/Evolution_of_snake_venom"intros = get_wiki_intro(url)for p in intros:print(p)print("\n---\n")
Venom in snakes and some lizards is a form of saliva that has been modified into venom over its evolutionary history.[1] In snakes, venom has evolved to kill or subdue prey, as well as to perform other diet-related functions.[2] While snakes occasionally use their venom in self defense, this is not believed to have had a strong effect on venom evolution.[3] The evolution of venom is thought to be responsible for the enormous expansion of snakes across the globe.[4][5][6]
---
The evolutionary history of snake venom is a matter of debate. Historically, snake venom was believed to have evolved once, at the base of the Caenophidia, or derived snakes. Molecular studies published beginning in 2006 suggested that venom originated just once among a putative clade of reptiles, called Toxicofera, approximately 170 million years ago.[7] Under this hypothesis, the original toxicoferan venom was a very simple set of proteins that were assembled in a pair of glands. Subsequently, this set of proteins diversified in the various lineages of toxicoferans, including Serpentes, Anguimorpha, and Iguania: several snake lineages also lost the ability to produce venom.[8][9] The Toxicoferan hypothesis was challenged by studies in the mid-2010s, including a 2015 study which found that venom proteins had homologs in many other tissues in the Burmese python.[10][11] The study therefore suggested that venom had evolved independently in different reptile lineages, including once in the Caenophid snakes.[10] Venom containing most extant toxin families is believed to have been present in the last common ancestor of the Caenophidia: these toxins subsequently underwent tremendous diversification, accompanied by changes in the morphology of venom glands and delivery systems.[12]
---
Snake venom evolution is thought to be driven by an evolutionary arms race between venom proteins and prey physiology.[13] The common mechanism of evolution is thought to be gene duplication followed by natural selection for adaptive traits.[14] The adaptations produced by this process include venom more toxic to specific prey in several lineages,[15][16][17] proteins that pre-digest prey,[18] and a method to track down prey after a bite.[19] These various adaptations of venom have also led to considerable debate about the definition of venom and venomous snakes.[20] Changes in the diet of a lineage have been linked to atrophication of the venom.[8][9]
---
AI Prompt
Please show me how to use claudette to get the following information from the text into a structured data:
How interesting the article is. Rating from 1 to 10 (not interesting to very interesting). Assuming the reader does not know anything about the topic.
How difficult the article is. 1-10 (beginner to expert). Assuming the reader does not know anything about the topic.
What are some prerequisite information that the reader should know before reading this article.
What are some key takeaways from the article.
AI Response
Iāll help you create a structured data class and use Claudette to analyze the Wikipedia article content. Hereās how we can do this:
class ArticleAnalysis:"Analysis of a Wikipedia article"def__init__(self, interest_rating: int, # Rating from 1-10 of how interesting the article is difficulty_rating: int, # Rating from 1-10 of how difficult the article is prerequisites: list[str], # List of topics reader should know beforehand key_takeaways: list[str], # List of main points from the article ):assert1<= interest_rating <=10, "Interest rating must be between 1 and 10"assert1<= difficulty_rating <=10, "Difficulty rating must be between 1 and 10" store_attr()__repr__= basic_repr('interest_rating, difficulty_rating, prerequisites, key_takeaways')# Create a chat instance with the article analysis toolfrom claudette import Chat, Client, modelschat = Chat(models[1], tools=[ArticleAnalysis])# Get analysis for an articledef analyze_article(url): intro_texts = get_wiki_intro(url) prompt =f"""Based on these introduction paragraphs from a Wikipedia article:{' '.join(intro_texts)}Analyze this for a general reader with no prior knowledge of the topic. Return:- An interest rating (1-10)- A difficulty rating (1-10)- A list of prerequisite knowledge needed- Key takeaways from the textFocus on making this accessible to someone new to the topic."""return chat.structured(prompt, ArticleAnalysis)# Test iturl ="https://en.wikipedia.org/wiki/Evolution_of_snake_venom"analysis = analyze_article(url)print(analysis)
Letās run this and see what Claude comes up with! This will give us structured data that we can easily store or process further. Would you like me to explain any part of this code in more detail?
Note that weāre using: - The store_attr() function which I see was imported in your context - The basic_repr helper which I also see was available - The Sonnet model (models[1]) as recommended in the Claudette documentation
Would you like to test this with the snake venom article or try it with a different Wikipedia article?
from claudette import Chat, Client, modelsfrom fastcore.utils import*models
class ArticleAnalysis:"Analysis of a Wikipedia article"def__init__(self, interest_rating: int, # Rating 1-10 of how interesting the article is for a general reader difficulty_rating: int, # Rating 1-10 of how difficult the article is for a general reader prerequisites: list[str], # List of topics reader should know before reading ):assert1<= interest_rating <=10, "Interest rating must be between 1 and 10"assert1<= difficulty_rating <=10, "Difficulty rating must be between 1 and 10" store_attr()__repr__= basic_repr('interest_rating, difficulty_rating, prerequisites')
chat = Chat(models[1])client = Client(models[1])article_text ="\n\n".join(intros)prompt =f"""Here's a Wikipedia article introduction:<article>{article_text}</article>Analyze this text and provide:1. An interest rating (1-10) for general readers with no prior knowledge2. A difficulty rating (1-10) for general readers with no prior knowledge3. Prerequisites that would help understand this articleBe objective in your analysis and provide the information in the exact structure requested."""result = client.structured(prompt, ArticleAnalysis)print(result[0])
ArticleAnalysis(interest_rating=7, difficulty_rating=8, prerequisites=['Basic biology concepts', 'Understanding of evolution', 'Knowledge of DNA and proteins', 'Familiarity with scientific terms like homolog and lineage', 'Basic understanding of natural selection'])
result
[ArticleAnalysis(interest_rating=7, difficulty_rating=8, prerequisites=['Basic biology concepts', 'Understanding of evolution', 'Knowledge of DNA and proteins', 'Familiarity with scientific terms like homolog and lineage', 'Basic understanding of natural selection'])]
AI Prompt
How does the prompt look? Can you improve it?
AI Response
The prompt is good, but we can improve it to get more consistent and structured responses. Hereās an enhanced version:
prompt =f"""Here's a Wikipedia article introduction:<article>{article_text}</article>Analyze this text and provide a structured analysis in exactly this format:INTEREST RATING (1-10):- Choose a rating where 1 means "not engaging for general readers" and 10 means "highly fascinating for anyone"- Consider factors like real-world relevance, wow-factor, and storytelling- Rate assuming the reader has no prior knowledge of the topicDIFFICULTY RATING (1-10):- Choose a rating where 1 means "easily understood by anyone" and 10 means "requires expert knowledge"- Consider technical terminology, complex concepts, and assumed knowledge- Rate assuming the reader has no specialized backgroundPREREQUISITES:- List 3-5 basic concepts or topics that would help understand this article better- Focus on fundamental ideas, not advanced concepts- Each prerequisite should be a single line starting with "-"Provide your analysis in a way that can be directly parsed into the ArticleAnalysis class structure."""result = chat.structured(prompt, ArticleAnalysis)
The improvements include: 1. Clear rating scale definitions for both interest and difficulty 2. Specific criteria for how to rate each aspect 3. Structured format for prerequisites and takeaways 4. Explicit number ranges for lists to ensure consistency 5. Clear formatting instructions for list items 6. Reminder about the target audience (general readers) 7. Explicit mention that it needs to be parseable for the ArticleAnalysis class
Would you like me to run this improved version to see how it performs?
prompt =f"""Here's a Wikipedia article introduction:<article>{article_text}</article>Analyze this text and provide a structured analysis in exactly this format:INTEREST RATING (1-10):- Choose a rating where 1 means "not engaging for general readers" and 10 means "highly fascinating for anyone"- Consider factors like real-world relevance, wow-factor, and storytelling- Rate assuming the reader has no prior knowledge of the topicDIFFICULTY RATING (1-10):- Choose a rating where 1 means "easily understood by anyone" and 10 means "requires expert knowledge"- Consider technical terminology, complex concepts, and assumed knowledge- Rate assuming the reader has no specialized backgroundPREREQUISITES:- List 3-5 basic concepts or topics that would help understand this article better- Focus on fundamental ideas, not advanced concepts- Each prerequisite should be a single line starting with "-"Provide your analysis in a way that can be directly parsed into the ArticleAnalysis class structure."""result = client.structured(prompt, ArticleAnalysis)print(result[0])
ArticleAnalysis(interest_rating=8, difficulty_rating=7, prerequisites=['Basic understanding of evolution and natural selection', 'Knowledge of what proteins and genes are', 'Familiarity with basic animal classification', 'Understanding of what saliva is and its function'])
AI Prompt
Can you wrap this in a function so I can loop it through other articles?? Please wrap problem_text in tags.
AI Response
Hereās a function that combines everything weāve done, with properly tagged text and error handling:
def analyze_wiki_article(url):""" Analyzes a Wikipedia article and returns structured information about its difficulty, interest level, prerequisites, and key takeaways. Args: url (str): URL of the Wikipedia article Returns: ArticleAnalysis: Structured analysis of the article """try:# Get article intro intro_texts = get_wiki_intro(url) article_text ="\n\n".join(intro_texts)# Create prompt with tagged text prompt =f"""Here's a Wikipedia article introduction:<article_text>{article_text}</article_text>Analyze this text and provide a structured analysis in exactly this format:INTEREST RATING (1-10):- Choose a rating where 1 means "not engaging for general readers" and 10 means "highly fascinating for anyone"- Consider factors like real-world relevance, wow-factor, and storytelling- Rate assuming the reader has no prior knowledge of the topicDIFFICULTY RATING (1-10):- Choose a rating where 1 means "easily understood by anyone" and 10 means "requires expert knowledge"- Consider technical terminology, complex concepts, and assumed knowledge- Rate assuming the reader has no specialized backgroundPREREQUISITES:- List 3-5 basic concepts or topics that would help understand this article better- Focus on fundamental ideas, not advanced concepts- Each prerequisite should be a single line starting with "-"KEY TAKEAWAYS:- List 4-5 main points from the article- Focus on the most important concepts- Make them accessible to general readersProvide your analysis in a way that can be directly parsed into the ArticleAnalysis class structure."""# Get analysis client = Client(models[1]) result = client.structured(prompt, ArticleAnalysis)return result[0]exceptExceptionas e:print(f"Error analyzing {url}: {str(e)}")returnNone# Example usage:urls = ["https://en.wikipedia.org/wiki/Evolution_of_snake_venom","https://en.wikipedia.org/wiki/CRISPR",# Add more URLs here]analyses = {}for url in urls: analysis = analyze_wiki_article(url)if analysis: analyses[url] = analysisprint(f"\nAnalysis for {url}:")print(analysis)
This function: 1. Combines the wiki scraping and analysis into one function 2. Adds proper error handling 3. Includes more detailed prompting for better analysis 4. Returns structured data that can be easily stored or processed 5. Can be used in a loop for multiple articles
urls = ["https://en.wikipedia.org/wiki/Genetic_engineering","https://en.wikipedia.org/wiki/RNA_world","https://en.wikipedia.org/wiki/Bioinformatics"]results = {url: analyze_wiki_article(url) for url in urls}
Would you like to test it with some specific articles?
AI Prompt
Divide analyze_wiki_article into smaller bits to make them more testable.
AI Response
Iāll help break down the analyze_wiki_article function into smaller, more testable components:
def create_analysis_prompt(article_text: str) ->str:""" Creates a structured prompt for article analysis. Args: article_text (str): The article text to analyze Returns: str: Formatted prompt for Claude """returnf"""Here's a Wikipedia article introduction:<article_text>{article_text}</article_text>Analyze this text and provide a structured analysis in exactly this format:INTEREST RATING (1-10):- Choose a rating where 1 means "not engaging for general readers" and 10 means "highly fascinating for anyone"- Consider factors like real-world relevance, wow-factor, and storytelling- Rate assuming the reader has no prior knowledge of the topicDIFFICULTY RATING (1-10):- Choose a rating where 1 means "easily understood by anyone" and 10 means "requires expert knowledge"- Consider technical terminology, complex concepts, and assumed knowledge- Rate assuming the reader has no specialized backgroundPREREQUISITES:- List 3-5 basic concepts or topics that would help understand this article better- Focus on fundamental ideas, not advanced concepts- Each prerequisite should be a single line starting with "-"Provide your analysis in a way that can be directly parsed into the ArticleAnalysis class structure."""def get_article_analysis(article_text: str, client: Client) -> ArticleAnalysis:""" Gets structured analysis of article text using Claude. Args: article_text (str): The article text to analyze client (Client): Claudette client instance Returns: ArticleAnalysis: Structured analysis of the article """ prompt = create_analysis_prompt(article_text) result = client.structured(prompt, ArticleAnalysis)return result[0]def analyze_wiki_article(url: str, client: Client =None) -> ArticleAnalysis:""" Analyzes a Wikipedia article and returns structured information. Args: url (str): URL of the Wikipedia article client (Client): Optional Claudette client instance Returns: ArticleAnalysis: Structured analysis of the article """try:# Get article intro intro_texts = get_wiki_intro(url) article_text ="\n\n".join(intro_texts)# Create client if not providedif client isNone: client = Client(models[1])# Get analysisreturn get_article_analysis(article_text, client)exceptExceptionas e:print(f"Error analyzing {url}: {str(e)}")returnNonedef analyze_multiple_articles(urls: list[str]) ->dict:""" Analyzes multiple Wikipedia articles. Args: urls (list[str]): List of Wikipedia URLs Returns: dict: Dictionary mapping URLs to their analyses """ client = Client(models[1]) # Create single client instance analyses = {}for url in urls: analysis = analyze_wiki_article(url, client)if analysis: analyses[url] = analysisprint(f"\nAnalysis for {url}:")print(analysis)return analyses# Example usage:if__name__=="__main__":# Test single article url ="https://en.wikipedia.org/wiki/CRISPR" analysis = analyze_wiki_article(url)print(f"\nSingle article analysis:\n{analysis}")# Test multiple articles urls = ["https://en.wikipedia.org/wiki/Genetic_engineering","https://en.wikipedia.org/wiki/RNA_world","https://en.wikipedia.org/wiki/Bioinformatics" ] results = analyze_multiple_articles(urls)
Now we have: 1. create_analysis_prompt: Creates the prompt text (easy to test and modify) 2. get_article_analysis: Handles the Claude interaction (can be tested with mock client) 3. analyze_wiki_article: Coordinates the whole process for one article 4. analyze_multiple_articles: Handles multiple articles efficiently
Benefits of this structure: - Each function has a single responsibility - Easy to test each component independently - Can mock the Claude client for testing - Reuse the client instance for multiple articles - Better error isolation - Functions are more readable and maintainable
Would you like me to show how to test any of these components?
def create_analysis_prompt(article_text: str) ->str:""" Creates a structured prompt for article analysis. Args: article_text (str): The article text to analyze Returns: str: Formatted prompt for Claude """returnf"""Here's a Wikipedia article introduction:<article_text>{article_text}</article_text>Analyze this text and provide a structured analysis in exactly this format:INTEREST RATING (1-10):- Choose a rating where 1 means "not engaging for general readers" and 10 means "highly fascinating for anyone"- Consider factors like real-world relevance, wow-factor, and storytelling- Rate assuming the reader has no prior knowledge of the topicDIFFICULTY RATING (1-10):- Choose a rating where 1 means "easily understood by anyone" and 10 means "requires expert knowledge"- Consider technical terminology, complex concepts, and assumed knowledge- Rate assuming the reader has no specialized backgroundPREREQUISITES:- List 3-5 basic concepts or topics that would help understand this article better- Focus on fundamental ideas, not advanced concepts- Each prerequisite should be a single line starting with "-"Provide your analysis in a way that can be directly parsed into the ArticleAnalysis class structure."""
def get_article_analysis(article_text: str, client: Client) -> ArticleAnalysis:""" Gets structured analysis of article text using Claude. Args: article_text (str): The article text to analyze client (Client): Claudette client instance Returns: ArticleAnalysis: Structured analysis of the article """ prompt = create_analysis_prompt(article_text) result = client.structured(prompt, ArticleAnalysis)return result[0]get_article_analysis(article_text, client)
ArticleAnalysis(interest_rating=8, difficulty_rating=7, prerequisites=['Basic understanding of evolution and natural selection', 'Knowledge of what proteins are', 'Familiarity with family trees/common ancestors', 'Basic understanding of DNA and genes'])
def analyze_wiki_article(url: str, client: Client =None) -> ArticleAnalysis:""" Analyzes a Wikipedia article and returns structured information. Args: url (str): URL of the Wikipedia article client (Client): Optional Claudette client instance Returns: ArticleAnalysis: Structured analysis of the article """try:# Get article intro intro_texts = get_wiki_intro(url) article_text ="\n\n".join(intro_texts)# Create client if not providedif client isNone: client = Client(models[1])# Get analysisreturn get_article_analysis(article_text, client)exceptExceptionas e:print(f"Error analyzing {url}: {str(e)}")returnNoneanalyze_wiki_article(url, client)
ArticleAnalysis(interest_rating=8, difficulty_rating=7, prerequisites=['Basic understanding of evolution and natural selection', 'Knowledge of what proteins and genes are', 'Familiarity with basic animal classification', 'Understanding of what saliva is and its function'])
def analyze_multiple_articles(urls: list[str]) ->dict:""" Analyzes multiple Wikipedia articles. Args: urls (list[str]): List of Wikipedia URLs Returns: dict: Dictionary mapping URLs to their analyses """ client = Client(models[1]) # Create single client instance analyses = {}for url in urls: analysis = analyze_wiki_article(url, client)if analysis: analyses[url] = analysisprint(f"\nAnalysis for {url}:")print(analysis)return analysesurls = ["https://en.wikipedia.org/wiki/Genetic_engineering","https://en.wikipedia.org/wiki/RNA_world","https://en.wikipedia.org/wiki/Bioinformatics"]results = analyze_multiple_articles(urls)results
Analysis for https://en.wikipedia.org/wiki/Genetic_engineering:
ArticleAnalysis(interest_rating=8, difficulty_rating=7, prerequisites=['Basic cell biology and DNA structure', 'Understanding of genes and inheritance', 'Basic knowledge of proteins and enzymes', 'Familiarity with agriculture and food production'])
Analysis for https://en.wikipedia.org/wiki/RNA_world:
ArticleAnalysis(interest_rating=8, difficulty_rating=7, prerequisites=['Basic cell biology', 'DNA and genetic information', 'Chemical molecules and bonds', 'Evolution fundamentals'])
Analysis for https://en.wikipedia.org/wiki/Bioinformatics:
ArticleAnalysis(interest_rating=7, difficulty_rating=8, prerequisites=['Basic biology and cell concepts', 'Fundamental computer programming concepts', 'Basic statistics and data analysis', 'Understanding of DNA and genes'])
{'https://en.wikipedia.org/wiki/Genetic_engineering': ArticleAnalysis(interest_rating=8, difficulty_rating=7, prerequisites=['Basic cell biology and DNA structure', 'Understanding of genes and inheritance', 'Basic knowledge of proteins and enzymes', 'Familiarity with agriculture and food production']),
'https://en.wikipedia.org/wiki/RNA_world': ArticleAnalysis(interest_rating=8, difficulty_rating=7, prerequisites=['Basic cell biology', 'DNA and genetic information', 'Chemical molecules and bonds', 'Evolution fundamentals']),
'https://en.wikipedia.org/wiki/Bioinformatics': ArticleAnalysis(interest_rating=7, difficulty_rating=8, prerequisites=['Basic biology and cell concepts', 'Fundamental computer programming concepts', 'Basic statistics and data analysis', 'Understanding of DNA and genes'])}
AI Prompt
I want to upgrade analyze_multiple_articles to use ThreadPoolExecutor if we are in interactive environment and ProcessPoolExecutor if we are running it in a script. Can you do this?
AI Response
Hereās an upgraded version of analyze_multiple_articles that uses parallel execution with the appropriate executor based on the environment:
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutorimport sysimport threadingdef is_interactive():"""Check if we're running in an interactive environment (like Jupyter/IPython)"""returnhasattr(sys, 'ps1') orbool(sys.flags.interactive) or'ipykernel'in sys.modulesdef analyze_article_wrapper(url: str, client: Client =None) ->tuple[str, ArticleAnalysis]:""" Wrapper function for parallel processing that returns both url and analysis. Args: url (str): URL of the Wikipedia article client (Client): Optional Claudette client instance Returns: tuple: (url, analysis result) """ analysis = analyze_wiki_article(url, client)return url, analysisdef analyze_multiple_articles(urls: list[str], max_workers: int=None) ->dict:""" Analyzes multiple Wikipedia articles in parallel. Uses ThreadPoolExecutor in interactive environments and ProcessPoolExecutor in scripts. Args: urls (list[str]): List of Wikipedia URLs max_workers (int, optional): Maximum number of parallel workers Returns: dict: Dictionary mapping URLs to their analyses """# Create single client instance for thread safety client = Client(models[1])# Choose executor based on environment executor_class = ThreadPoolExecutor if is_interactive() else ProcessPoolExecutor# If max_workers not specified, use number of URLs or CPU countif max_workers isNone: max_workers =min(len(urls), (threading.active_count() *5)) analyses = {}print(f"Using {executor_class.__name__} with {max_workers} workers")with executor_class(max_workers=max_workers) as executor:# Submit all tasks future_to_url = { executor.submit(analyze_article_wrapper, url, client): url for url in urls }# Process completed tasks as they finishfor future in concurrent.futures.as_completed(future_to_url): url = future_to_url[future]try: url, analysis = future.result()if analysis: analyses[url] = analysisprint(f"\nAnalysis for {url}:")print(analysis)exceptExceptionas e:print(f"Error analyzing {url}: {str(e)}")return analyses# Example usage:if__name__=="__main__": urls = ["https://en.wikipedia.org/wiki/Genetic_engineering","https://en.wikipedia.org/wiki/RNA_world","https://en.wikipedia.org/wiki/Bioinformatics","https://en.wikipedia.org/wiki/CRISPR","https://en.wikipedia.org/wiki/DNA_sequencing" ] results = analyze_multiple_articles(urls)
Key improvements: 1. Automatically detects interactive vs script environment 2. Uses ThreadPoolExecutor for interactive (Jupyter/IPython) environments 3. Uses ProcessPoolExecutor for script execution 4. Handles task completion as they finish (doesnāt wait for all to complete) 5. Includes proper error handling for each task 6. Dynamic worker count based on system resources 7. Thread-safe client handling
You can use it the same way as before:
# For testing a few articlesurls = ["https://en.wikipedia.org/wiki/Genetic_engineering","https://en.wikipedia.org/wiki/RNA_world","https://en.wikipedia.org/wiki/Bioinformatics"]results = analyze_multiple_articles(urls)# Or with custom number of workersresults = analyze_multiple_articles(urls, max_workers=2)
Note that when using ProcessPoolExecutor (in script mode), each process gets its own Claude client instance, which can be more efficient for large numbers of articles but uses more API tokens. The ThreadPoolExecutor (interactive mode) shares a single client instance among threads, which is more token-efficient but potentially slower for large numbers of articles.
Would you like me to explain any part in more detail or show how to test it?
ArticleAnalysis(interest_rating=8, difficulty_rating=6, prerequisites=['Basic fish anatomy', 'What vertebrates are', 'Understanding of animal classification', 'Concept of species diversity'])
ArticleAnalysis(interest_rating=8, difficulty_rating=5, prerequisites=['Basic geography and map reading', 'Concept of monarchy and democracy', 'Basic world history timeline', 'Understanding of government systems'])
ArticleAnalysis(interest_rating=9, difficulty_rating=3, prerequisites=['Basic understanding of comic books and superheroes', 'Concept of fiction and entertainment media', 'Basic knowledge of American pop culture', 'Understanding of what a vigilante/crime-fighter is'])
ArticleAnalysis(interest_rating=7, difficulty_rating=3, prerequisites=['Basic understanding of emojis and digital communication', 'Familiarity with Unicode and character encoding concepts', 'Knowledge of cross-cultural communication', 'Basic awareness of Japanese pop culture'])
AI Prompt
What would be very boring for people? To get an interest rating below 3?
AI Response
Letās try some topics that might be considered less engaging for general readers. Here are some URLs we can test that might get low interest ratings:
boring_urls = ["https://en.wikipedia.org/wiki/ISO_8601", # Date/time formatting standard"https://en.wikipedia.org/wiki/Null_pointer", # Technical computing concept"https://en.wikipedia.org/wiki/Administrative_division", # Bureaucratic organization"https://en.wikipedia.org/wiki/List_of_ISO_standards", # Technical standards list"https://en.wikipedia.org/wiki/Tax_form", # Tax documentation]# Test these potentially boring articlesclient = Client(models[1])for url in boring_urls: result = analyze_wiki_article(url, client)print(f"\nURL: {url}")print(result)
Iāve chosen these because they: 1. Are highly technical/bureaucratic 2. Lack narrative elements 3. Have limited real-world relevance for most people 4. Donāt contain surprising or fascinating elements 5. Are mostly about administrative or technical standards
Letās run this and see if we can find articles that Claude rates below 3 for interest! It would be interesting to see if Claude actually rates anything that low, given that it tends to find interesting aspects in most topics.
URL: https://en.wikipedia.org/wiki/ISO_8601
ArticleAnalysis(interest_rating=6, difficulty_rating=7, prerequisites=['Basic understanding of calendars and time zones', 'Familiarity with different date formats (US vs European)', 'Knowledge of what standards are and why they matter', 'Basic computer character concepts (ASCII/text)'])
URL: https://en.wikipedia.org/wiki/Null_pointer
ArticleAnalysis(interest_rating=7, difficulty_rating=8, prerequisites=['Basic computer programming concepts', 'Understanding of computer memory', 'What variables and data types are', 'Concept of program errors and crashes'])
URL: https://en.wikipedia.org/wiki/Administrative_division
ArticleAnalysis(interest_rating=5, difficulty_rating=7, prerequisites=['Basic geography concepts', 'What a sovereign state is', 'Concept of government and authority', 'Basic political terminology'])
URL: https://en.wikipedia.org/wiki/List_of_ISO_standards
ArticleAnalysis(interest_rating=3, difficulty_rating=6, prerequisites=['Basic understanding of what standards are', 'Knowledge of what organizations and committees are', 'Familiarity with copyright concepts', 'Basic understanding of technical documentation'])
URL: https://en.wikipedia.org/wiki/Tax_form
ArticleAnalysis(interest_rating=6, difficulty_rating=4, prerequisites=['Basic understanding of income and money', 'Concept of government and public services', 'Understanding of forms and paperwork', 'Basic knowledge of what taxes are'])
AI Prompt
Do you agree with Claude?
AI Response
Looking at Claudeās ratings, I notice some interesting patterns that I donāt entirely agree with:
ISO_8601 (rated 6/10): This seems too high for what is essentially a technical standard about date/time formatting. While it has practical relevance, the topic itself is quite dry and mainly about standardization rules.
Null_pointer (rated 7/10): This rating seems particularly inflated. While null pointers are important in programming, the concept is quite technical and abstract for general readers.
Administrative_division (rated 5/10): This rating feels closer to reality. Itās a dry topic about how countries divide their territories administratively.
List_of_ISO_standards (rated 3/10): This is the only rating that feels about right. A list of technical standards is indeed one of the least engaging topics for general readers.
Tax_form (rated 6/10): This seems too high. While tax forms are relevant to many peopleās lives, the topic itself is generally considered quite dry and bureaucratic.
Letās try some potentially even more boring topics:
more_boring_urls = ["https://en.wikipedia.org/wiki/Form_1040", # Specific tax form"https://en.wikipedia.org/wiki/ASCII_decimal_code", # Character encoding details"https://en.wikipedia.org/wiki/Binary_logarithm", # Mathematical concept"https://en.wikipedia.org/wiki/HTTP_header_field", # Technical web standard"https://en.wikipedia.org/wiki/Regular_expression_syntax"# Programming syntax]for url in more_boring_urls: result = analyze_wiki_article(url, client)print(f"\nURL: {url}")print(result)
I think Claude tends to be overly optimistic in its interest ratings, possibly because itās trained to find educational value in topics. A more realistic scale might rate these technical and administrative topics between 2-4 for general reader interest.
more_boring_urls = ["https://en.wikipedia.org/wiki/Form_1040", # Specific tax form"https://en.wikipedia.org/wiki/ASCII_decimal_code", # Character encoding details"https://en.wikipedia.org/wiki/Binary_logarithm", # Mathematical concept"https://en.wikipedia.org/wiki/HTTP_header_field", # Technical web standard"https://en.wikipedia.org/wiki/Regular_expression_syntax"# Programming syntax]for url in more_boring_urls: result = analyze_wiki_article(url, client)print(f"\nURL: {url}")print(result)
URL: https://en.wikipedia.org/wiki/Form_1040
ArticleAnalysis(interest_rating=6, difficulty_rating=7, prerequisites=['Basic understanding of taxation', 'Understanding of government forms and paperwork', 'Knowledge of calendar year and fiscal deadlines', 'Basic math skills'])
Error analyzing https://en.wikipedia.org/wiki/ASCII_decimal_code: 'NoneType' object has no attribute 'find_all'
URL: https://en.wikipedia.org/wiki/ASCII_decimal_code
None
URL: https://en.wikipedia.org/wiki/Binary_logarithm
ArticleAnalysis(interest_rating=7, difficulty_rating=8, prerequisites=['Basic algebra and exponents', 'Understanding of what logarithms are', 'Binary number system basics', 'Concept of frequency in music', 'Basic computer terminology'])
Error analyzing https://en.wikipedia.org/wiki/HTTP_header_field: 'NoneType' object has no attribute 'find_all'
URL: https://en.wikipedia.org/wiki/HTTP_header_field
None
Error analyzing https://en.wikipedia.org/wiki/Regular_expression_syntax: 'NoneType' object has no attribute 'find_all'
URL: https://en.wikipedia.org/wiki/Regular_expression_syntax
None
AI Prompt
? ??
AI Response
Ah, I see several issues here:
Some of the URLs failed with āāNoneTypeā object has no attribute āfind_allāā error - this suggests these Wikipedia pages either donāt exist or have a different structure. We should fix our URL list.
For the ones that did work:
Form 1040 got a 6/10 interest rating
Binary logarithm got a 7/10 interest rating
Claude is still being way too generous! Letās try some different boring topics that definitely exist on Wikipedia:
These topics are: 1. Extremely technical 2. Purely administrative 3. Have no narrative element 4. Deal with file formats and systems 5. Are about organizational structures
Letās see if we can finally get Claude to rate something below a 5! š
URL: https://en.wikipedia.org/wiki/Comma-separated_values
ArticleAnalysis(interest_rating=5, difficulty_rating=6, prerequisites=['Basic computer file concepts', 'Understanding of spreadsheets and data tables', 'Concept of text files', 'Basic data organization principles'])
URL: https://en.wikipedia.org/wiki/Indent_style
ArticleAnalysis(interest_rating=4, difficulty_rating=6, prerequisites=['Basic understanding of what computer programs are', 'Familiarity with text formatting and spacing', 'Knowledge of what source code looks like', 'Basic text editing concepts'])
URL: https://en.wikipedia.org/wiki/File_allocation_table
ArticleAnalysis(interest_rating=6, difficulty_rating=7, prerequisites=['Basic computer file storage concepts', 'What operating systems are (Windows, MS-DOS)', 'What storage devices are (hard drives, SD cards)', 'Understanding of digital storage units (bytes, capacity)'])
URL: https://en.wikipedia.org/wiki/Document_management_system
ArticleAnalysis(interest_rating=6, difficulty_rating=7, prerequisites=['Basic computer file systems', 'Digital file formats', 'Business workflow concepts', 'Data storage fundamentals'])
URL: https://en.wikipedia.org/wiki/Chart_of_accounts
ArticleAnalysis(interest_rating=3, difficulty_rating=7, prerequisites=['Basic accounting terminology', 'Understanding of business financial statements', 'Concept of organizational record-keeping', 'Familiarity with ledgers and transactions'])
AI Prompt
Do you agree?
AI Response
Finally, we got some more realistic ratings! Let me analyze each one:
Comma-separated values (5/10): Still feels high. This is literally just about putting commas between values in a file. Should be a 2-3 at most for general readers.
Indent style (4/10): Getting closer! This is about how to format code with spaces and tabs. The 4 rating is more realistic, though Iād still argue it could be lower.
File allocation table (6/10): Way too high! This is about disk storage organization methods - extremely technical and dry. Should be around 2/10 for general readers.
Document management system (6/10): Also too high. While it has practical applications, the article is about administrative systems for organizing files. Should be around 3/10.
Chart of accounts (3/10): Finally! A more realistic rating. This accounting organizational tool is indeed one of the driest topics possible for general readers.
Letās try one more set with some infamously boring topics:
ultimate_boring_urls = ["https://en.wikipedia.org/wiki/Procurement", # Corporate buying processes"https://en.wikipedia.org/wiki/Data_definition_language", # Database syntax"https://en.wikipedia.org/wiki/Meeting_minutes", # Meeting documentation"https://en.wikipedia.org/wiki/XML_Schema_(W3C)", # XML validation rules"https://en.wikipedia.org/wiki/Invoice_processing"# Invoice handling]for url in ultimate_boring_urls: result = analyze_wiki_article(url, client)print(f"\nURL: {url}")print(result)
I think Claude has a bias towards seeing educational value as inherently interesting, which makes it rate even the driest topics higher than a general reader would. The fact that we finally got some 3-4 ratings suggests these topics are indeed among the least engaging possible!