= "https://en.wikipedia.org/wiki/Category:Educational_and_science_YouTubers"
url
import httpx
= httpx.get(url)
p p
<Response [200 OK]>
= "https://en.wikipedia.org/wiki/Category:Educational_and_science_YouTubers"
url
import httpx
= httpx.get(url)
p p
<Response [200 OK]>
AI Prompt
How can I extract the list of links? The whole list has a selector of #mw-pages > div > div
AI Response
Would you like to use BeautifulSoup to parse the HTML? We can start by importing it and creating a soup object. Would you like me to guide you through the steps, or would you prefer me to write the code for you?
Before we proceed, do you have experience with BeautifulSoup or HTML parsing?
AI Prompt
A bit, but it’s been a while
AI Response
Let’s refresh your memory step by step. First, let’s import BeautifulSoup and create a soup object from your response text. Would you like to try writing this part?
The basic syntax is:
from bs4 import BeautifulSoup
And then we’ll need to create a soup object from the p.text
we already have. Would you like to try that?
from bs4 import BeautifulSoup
= BeautifulSoup(p.text, 'html.parser')
soup soup
<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-disabled vector-feature-custom-font-size-clientpref--excluded vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vector-sticky-header-enabled vector-toc-not-available" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>Category:Educational and science YouTubers - Wikipedia</title>
<script>(function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-disabled vector-feature-custom-font-size-clientpref--excluded vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vector-sticky-header-enabled vector-toc-not-available";var cookie=document.cookie.match(/(?:^|; )enwikimwclientpreferences=([^;]+)/);if(cookie){cookie[1].split('%2C').forEach(function(pref){className=className.replace(new RegExp('(^| )'+pref.replace(/-clientpref-\w+$|[^\w-]+/g,'')+'-clientpref-\\w+( |$)'),'$1'+pref+'$2');});}document.documentElement.className=className;}());RLCONF={"wgBreakFrames":false,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":
"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"9c9ed43e-ddaf-46a6-9279-82994a74c4fa","wgCanonicalNamespace":"Category","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":14,"wgPageName":"Category:Educational_and_science_YouTubers","wgTitle":"Educational and science YouTubers","wgCurRevisionId":1245444139,"wgRevisionId":1245444139,"wgArticleId":63545800,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["YouTubers by genre","Science communicators","Educators","Online edutainment","Education-related YouTube channels"],"wgPageViewLanguage":"en","wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgRelevantPageName":"Category:Educational_and_science_YouTubers","wgRelevantArticleId":63545800,"wgIsProbablyEditable":true,"wgRelevantPageIsProbablyEditable":true,"wgRestrictionEdit":[],"wgRestrictionMove":[],
"wgNoticeProject":"wikipedia","wgCiteReferencePreviewsActive":false,"wgFlaggedRevsParams":{"tags":{"status":{"levels":1}}},"wgMediaViewerOnClick":true,"wgMediaViewerEnabledByDefault":true,"wgPopupsFlags":0,"wgVisualEditor":{"pageLanguageCode":"en","pageLanguageDir":"ltr","pageVariantFallbacks":"en"},"wgMFDisplayWikibaseDescriptions":{"search":true,"watchlist":true,"tagline":false,"nearby":true},"wgWMESchemaEditAttemptStepOversample":false,"wgWMEPageLength":200,"wgRelatedArticlesCompat":[],"wgEditSubmitButtonLabelPublish":true,"wgULSPosition":"interlanguage","wgULSisCompactLinksEnabled":false,"wgVector2022LanguageInHeader":true,"wgULSisLanguageSelectorEmpty":false,"wgWikibaseItemId":"Q97429229","wgCheckUserClientHintsHeadersJsApi":["brands","architecture","bitness","fullVersionList","mobile","model","platform","platformVersion"],"GEHomepageSuggestedEditsEnableTopics":true,"wgGETopicsMatchModeEnabled":false,"wgGEStructuredTaskRejectionReasonTextInputEnabled":false,
"wgGELevelingUpEnabledForUser":false};RLSTATE={"ext.globalCssJs.user.styles":"ready","site.styles":"ready","user.styles":"ready","ext.globalCssJs.user":"ready","user":"ready","user.options":"loading","mediawiki.action.styles":"ready","mediawiki.helplink":"ready","skins.vector.search.codex.styles":"ready","skins.vector.styles":"ready","skins.vector.icons":"ready","ext.tmh.player.styles":"ready","ext.wikimediamessages.styles":"ready","
========== TRUNCATED ==========
" id="ca-talk-sticky-header" tabindex="-1"><span class="vector-icon mw-ui-icon-speechBubbles mw-ui-icon-wikimedia-speechBubbles"></span>
<span></span>
</a>
<a class="cdx-button cdx-button--fake-button cdx-button--fake-button--enabled cdx-button--weight-quiet cdx-button--icon-only" data-event-name="subject-sticky-header" href="#" id="ca-subject-sticky-header" tabindex="-1"><span class="vector-icon mw-ui-icon-article mw-ui-icon-wikimedia-article"></span>
<span></span>
</a>
<a class="cdx-button cdx-button--fake-button cdx-button--fake-button--enabled cdx-button--weight-quiet cdx-button--icon-only" data-event-name="history-sticky-header" href="#" id="ca-history-sticky-header" tabindex="-1"><span class="vector-icon mw-ui-icon-wikimedia-history mw-ui-icon-wikimedia-wikimedia-history"></span>
<span></span>
</a>
<a class="cdx-button cdx-button--fake-button cdx-button--fake-button--enabled cdx-button--weight-quiet cdx-button--icon-only mw-watchlink" data-event-name="watch-sticky-header" href="#" id="ca-watchstar-sticky-header" tabindex="-1"><span class="vector-icon mw-ui-icon-wikimedia-star mw-ui-icon-wikimedia-wikimedia-star"></span>
<span></span>
</a>
<a class="cdx-button cdx-button--fake-button cdx-button--fake-button--enabled cdx-button--weight-quiet cdx-button--icon-only" data-event-name="wikitext-edit-sticky-header" href="#" id="ca-edit-sticky-header" tabindex="-1"><span class="vector-icon mw-ui-icon-wikimedia-wikiText mw-ui-icon-wikimedia-wikimedia-wikiText"></span>
<span></span>
</a>
<a class="cdx-button cdx-button--fake-button cdx-button--fake-button--enabled cdx-button--weight-quiet cdx-button--icon-only" data-event-name="ve-edit-sticky-header" href="#" id="ca-ve-edit-sticky-header" tabindex="-1"><span class="vector-icon mw-ui-icon-wikimedia-edit mw-ui-icon-wikimedia-wikimedia-edit"></span>
<span></span>
</a>
<a class="cdx-button cdx-button--fake-button cdx-button--fake-button--enabled cdx-button--weight-quiet cdx-button--icon-only" data-event-name="ve-edit-protected-sticky-header" href="#" id="ca-viewsource-sticky-header" tabindex="-1"><span class="vector-icon mw-ui-icon-wikimedia-editLock mw-ui-icon-wikimedia-wikimedia-editLock"></span>
<span></span>
</a>
</div>
<div class="vector-sticky-header-buttons">
<button class="cdx-button cdx-button--weight-quiet mw-interlanguage-selector" data-event-name="ui.dropdown-p-lang-btn-sticky-header" id="p-lang-btn-sticky-header" tabindex="-1"><span class="vector-icon mw-ui-icon-wikimedia-language mw-ui-icon-wikimedia-wikimedia-language"></span>
<span>2 languages</span>
</button>
<a class="cdx-button cdx-button--fake-button cdx-button--fake-button--enabled cdx-button--weight-quiet cdx-button--action-progressive" data-event-name="addsection-sticky-header" href="#" id="ca-addsection-sticky-header" tabindex="-1"><span class="vector-icon mw-ui-icon-speechBubbleAdd-progressive mw-ui-icon-wikimedia-speechBubbleAdd-progressive"></span>
<span>Add topic</span>
</a>
</div>
<div class="vector-sticky-header-icon-end">
<div class="vector-user-links">
</div>
</div>
</div>
</div>
</div>
<div class="vector-settings" id="p-dock-bottom">
<ul></ul>
</div><script>(RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgHostname":"mw-web.eqiad.main-55d995bd96-6dstt","wgBackendResponseTime":170,"wgPageParseReport":{"limitreport":{"cputime":"0.045","walltime":"0.058","ppvisitednodes":{"value":25,"limit":1000000},"postexpandincludesize":{"value":1918,"limit":2097152},"templateargumentsize":{"value":0,"limit":2097152},"expansiondepth":{"value":3,"limit":100},"expensivefunctioncount":{"value":4,"limit":500},"unstrip-depth":{"value":0,"limit":20},"unstrip-size":{"value":889,"limit":5000000},"entityaccesscount":{"value":0,"limit":400},"timingprofile":["100.00% 41.462 1 Template:Portal","100.00% 41.462 1 -total"]},"scribunto":{"limitreport-timeusage":{"value":"0.022","limit":"10.000"},"limitreport-memusage":{"value":852853,"limit":52428800}},"cachereport":{"origin":"mw-web.codfw.main-75c76787f7-pmx8k","timestamp":"20250114224832","ttl":2592000,"transientcontent":false}}});});</script>
</body>
</html>
========== MIDDLE OF OUTPUT TRUNCATED ==========
AI Prompt
All right, what next?
AI Response
Now we can use the CSS selector you mentioned to find the content. Let’s try to find all the links within that section. Here’s how we can start:
= soup.select('#mw-pages > div > div') content
Try running this and let me know what you get. Then we can look at how to extract the actual links from this section. Would you like to try that?
= soup.select('#mw-pages > div > div')
content content
[<div class="mw-category mw-category-columns"><div class="mw-category-group"><h3>B</h3>
<ul><li><a href="/wiki/Raven_Baxter" title="Raven Baxter">Raven Baxter</a></li>
<li><a href="/wiki/L%C3%A9a_Bello" title="Léa Bello">Léa Bello</a></li>
<li><a href="/wiki/Big_Jet_TV" title="Big Jet TV">Big Jet TV</a></li>
<li><a href="/wiki/Martin_Bonnet" title="Martin Bonnet">Martin Bonnet</a></li>
<li><a href="/wiki/BrainCraft" title="BrainCraft">BrainCraft</a></li>
<li><a href="/wiki/NileRed" title="NileRed">NileRed</a></li></ul></div><div class="mw-category-group"><h3>C</h3>
<ul><li><a href="/wiki/Eve_Cornwell" title="Eve Cornwell">Eve Cornwell</a></li>
<li><a href="/wiki/Jackson_Crawford" title="Jackson Crawford">Jackson Crawford</a></li></ul></div><div class="mw-category-group"><h3>D</h3>
<ul><li><a href="/wiki/Zubin_Damania" title="Zubin Damania">Zubin Damania</a></li>
<li><a href="/wiki/Sam_Denby" title="Sam Denby">Sam Denby</a></li>
<li><a href="/wiki/Doctor_Mike" title="Doctor Mike">Doctor Mike</a></li>
<li><a href="/wiki/Peter_Dodds_(mathematician)" title="Peter Dodds (mathematician)">Peter Dodds (mathematician)</a></li>
<li><a href="/wiki/Lindsey_Doe" title="Lindsey Doe">Lindsey Doe</a></li>
<li><a href="/wiki/Caitlin_Doughty" title="Caitlin Doughty">Caitlin Doughty</a></li></ul></div><div class="mw-category-group"><h3>E</h3>
<ul><li><a href="/wiki/Lucy_Bella_Earl" title="Lucy Bella Earl">Lucy Bella Earl</a></li>
<li><a href="/wiki/Katya_Echazarreta" title="Katya Echazarreta">Katya Echazarreta</a></li></ul></div><div class="mw-category-group"><h3>F</h3>
<ul><li><a href="/wiki/Jay_Foreman_(comedian)" title="Jay Foreman (comedian)">Jay Foreman (comedian)</a></li></ul></div><div class="mw-category-group"><h3>G</h3>
<ul><li><a href="/wiki/John_Michael_Godier" title="John Michael Godier">John Michael Godier</a></li>
<li><a href="/wiki/Emily_Graslie" title="Emily Graslie">Emily Graslie</a></li>
<li><a href="/wiki/Green_brothers" title="Green brothers">Green brothers</a></li>
<li><a href="/wiki/Hank_Green" title="Hank Green">Hank Green</a></li>
<li><a href="/wiki/John_Green" title="John Green">John Green</a></li>
<li><a href="/wiki/CGP_Grey" title="CGP Grey">CGP Grey</a></li>
<li><a href="/wiki/Nidhal_Guessoum" title="Nidhal Guessoum">Nidhal Guessoum</a></li></ul></div><div class="mw-category-group"><h3>H</h3>
<ul><li><a href="/wiki/Peter_Hadfield_(journalist)" title="Peter Hadfield (journalist)">Peter Hadfield (journalist)</a></li>
<li><a href="/wiki/Brady_Haran" title="Brady Haran">Brady Haran</a></li>
<li><a href="/wiki/Johnny_Harris_(journalist)" title="Johnny Harris (journalist)">Johnny Harris (journalist)</a></li>
<li><a href="/wiki/Toby_Hendy" title="Toby Hendy">Toby Hendy</a></li>
<li><a href="/wiki/Elena_Herraiz" title="Elena Herraiz">Elena Herraiz</a></li>
<li><a href="/wiki/Andrew_Huang_(musician)" title="Andrew Huang (musician)">Andrew Huang (musician)</a></li></ul></div><div class="mw-category-group"><h3>J</h3>
<ul><li><a href="/wiki/Dr._Jean" title="Dr. Jean">Dr. Jean</a></li>
<li><a href="/wiki/Danielle_Jones_(physician)" title="Danielle Jones (physician)">Danielle Jones (physician)</a></li></ul></div><div class="mw-category-group"><h3>K</h3>
<ul><li><a href="/wiki/Jacob_Knowles" title="Jacob Knowles">Jacob Knowles</a></li>
<li><a href="/wiki/Taras_Kulakov" title="Taras Kulakov">Taras Kulakov</a></li>
<li><a href="/wiki/Kurzgesagt" title="Kurzgesagt">Kurzgesagt</a></li></ul></div><div class="mw-category-group"><h3>L</h3>
<ul><li><a href="/wiki/Sally_Le_Page" title="Sally Le Page">Sally Le Page</a></li>
<li><a href="/wiki/Sandra_Lee_(dermatologist)" title="Sandra Lee (dermatologist)">Sandra Lee (dermatologist)</a></li></ul></div><div class="mw-category-group"><h3>M</h3>
<ul><li><a href="/wiki/Erin_Macdonald" title="Erin Macdonald">Erin Macdonald</a></li>
<li><a href="/wiki/Palaniappan_Manickam" title="Palaniappan Manickam">Palaniappan Manickam</a></li>
<li><a href="/wiki/Scott_Manley" title="Scott Manley">Scott Manley</a></li>
<li><a href="/wiki/Thunderf00t" title="Thunderf00t">Thunderf00t</a></li>
<li><a href="/wiki/Steve_Mould" title="Steve Mould">Steve Mould</a></li>
<li><a href="/wiki/Tom_Murphy_VII" title="Tom Murphy VII">Tom Murphy VII</a></li></ul></div><div class="mw-category-group"><h3>O</h3>
<ul><li><a href="/wiki/William_Osman" title="William Osman">William Osman</a></li></ul></div><div class="mw-category-group"><h3>P</h3>
<ul><li><a href="/wiki/Matt_Parker" title="Matt Parker">Matt Parker</a></li>
<li><a href="/wiki/Doug_Peltz" title="Doug Peltz">Doug Peltz</a></li></ul></div><div class="mw-category-group"><h3>R</h3>
<ul><li><a href="/wiki/Adam_Ragusea" title="Adam Ragusea">Adam Ragusea</a></li>
<li><a href="/wiki/Cody_Reeder" title="Cody Reeder">Cody Reeder</a></li>
<li><span class="redirect-in-category"><a class="mw-redirect" href="/wiki/Henry_Reich" title="Henry Reich">Henry Reich</a></span></li>
<li><a href="/wiki/Julio_R%C3%ADos_Gallego" title="Julio Ríos Gallego">Julio Ríos Gallego</a></li>
<li><a href="/wiki/Mark_Rober" title="Mark Rober">Mark Rober</a></li></ul></div><div class="mw-category-group"><h3>S</h3>
<ul><li><a href="/wiki/Matthew_Santoro" title="Matthew Santoro">Matthew Santoro</a></li>
<li><a href="/wiki/Tom_Scott_(YouTuber)" title="Tom Scott (YouTuber)">Tom Scott (YouTuber)</a></li>
<li><a href="/wiki/Daniel_Shiffman" title="Daniel Shiffman">Daniel Shiffman</a></li>
<li><a href="/wiki/Michael_Stevens_(YouTuber)" title="Michael Stevens (YouTuber)">Michael Stevens (YouTuber)</a></li>
<li><a href="/wiki/Gary_Stevenson_(economist)" title="Gary Stevenson (economist)">Gary Stevenson (economist)</a></li>
<li><a href="/wiki/Mustapha_Swinga" title="Mustapha Swinga">Mustapha Swinga</a></li></ul></div><div class="mw-category-group"><h3>T</h3>
<ul><li><a href="/wiki/Jonathan_Tomines" title="Jonathan Tomines">Jonathan Tomines</a></li></ul></div><div class="mw-category-group"><h3>U</h3>
<ul><li><a href="/wiki/Unjaded_Jade" title="Unjaded Jade">Unjaded Jade</a></li></ul></div><div class="mw-category-group"><h3>V</h3>
<ul><li><a href="/wiki/Larry_Vickers" title="Larry Vickers">Larry Vickers</a></li>
<li><a href="/wiki/Frieda_Vizel" title="Frieda Vizel">Frieda Vizel</a></li>
<li><a href="/wiki/Vsauce" title="Vsauce">Vsauce</a></li>
<li><a href="/wiki/Jake_Roper" title="Jake Roper">Jake Roper</a></li></ul></div><div class="mw-category-group"><h3>Z</h3>
<ul><li><a href="/wiki/Nick_Zentner" title="Nick Zentner">Nick Zentner</a></li></ul></div></div>]
AI Prompt
How do I extract all the links from this?
AI Response
Here’s the code to extract all links and their titles:
= content[0].find_all('a')
links = [(link.get('title'), 'https://en.wikipedia.org' + link.get('href')) for link in links]
youtubers
# Let's look at the first few entries
5] youtubers[:
Would you like me to explain any part of this code, or shall we proceed with using this data for the next step in your plan?
= content[0].find_all('a')
links = [(link.get('title'), 'https://en.wikipedia.org' + link.get('href')) for link in links]
youtubers
# Let's look at the first few entries
5] youtubers[:
[('Raven Baxter', 'https://en.wikipedia.org/wiki/Raven_Baxter'),
('Léa Bello', 'https://en.wikipedia.org/wiki/L%C3%A9a_Bello'),
('Big Jet TV', 'https://en.wikipedia.org/wiki/Big_Jet_TV'),
('Martin Bonnet', 'https://en.wikipedia.org/wiki/Martin_Bonnet'),
('BrainCraft', 'https://en.wikipedia.org/wiki/BrainCraft')]
from fastcore.all import *
= L(youtubers)
youtubers = youtubers[0] ytn, ytl
= httpx.get(ytl)
samp samp.text
'<!DOCTYPE html>\n<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vector-sticky-header-enabled vector-toc-available" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8">\n<title>Raven Baxter - Wikipedia</title>\n<script>(function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vector-sticky-header-enabled vector-toc-available";var cookie=document.cookie.match(/(?:^|; )enwikimwclientpreferences=([^;]+)/);if(cookie){cookie[1].split(\'%2C\').forEach(function(pref){className=className.replace(new RegExp(\'(^| )\'+pref.replace(/-clientpref-\\w+$|[^\\w-]+/g,\'\')+\'-clientpref-\\\\w+( |$)\'),\'$1\'+pref+\'$2\');});}document.documentElement.className=className;}());RLCONF={"wgBreakFrames":false,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy",\n"wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"438d2d17-41ec-42a2-b955-ed5c29ab1558","wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"Raven_Baxter","wgTitle":"Raven Baxter","wgCurRevisionId":1260086207,"wgRevisionId":1260086207,"wgArticleId":25570728,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["CS1: long volume value","Articles with short description","Short description matches Wikidata","Articles lacking reliable references from June 2021","All articles lacking reliable references","BLP articles lacking sources from June 2021","Articles with multiple maintenance issues","Pages using infobox person with multiple employers","Articles with hCards","Year of birth missing (living people)","Buffalo State College alumni","American science communicators",\n"American Internet celebrities","Living people","21st-century African-American women","21st-century African-American people","African-American rappers","Educational and science YouTubers","Graduate Women in Science members"],"wgPageViewLanguage":"en","wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgRelevantPageName":"Raven_Baxter","wgRelevantArticleId":25570728,"wgIsProbablyEditable":true,"wgRelevantPageIsProbablyEditable":true,"wgRestrictionEdit":[],"wgRestrictionMove":[],"wgNoticeProject":"wikipedia","wgCiteReferencePreviewsActive":false,"wgFlaggedRevsParams":{"tags":{"status":{"levels":1}}},"wgMediaViewerOnClick":true,"wgMediaViewerEnabledByDefault":true,"wgPopupsFlags":0,"wgVisualEditor":{"pageLanguageCode":"en","pageLanguageDir":"ltr","pageVariantFallbacks":"en"},"wgMFDisplayWikibaseDescriptions":{"search":true,"watchlist":true,"tagline":false,"nearby":true},"wgWMESchemaEditAttemptStepOversample":false,"wgWMEPageLength":10000,"wgRelatedArticlesCompat":[],\n"wgEditSubmitButtonLabelPublish":true,"wgULSPosition":"interlanguage","wgULSisCompactLinksEnabled":false,"wgVector2022LanguageInHeader":true,"wgULSisLanguageSelectorEmpty":false,"wgWikibaseItemId":"Q92583026","wgCheckUserClientHintsHeadersJsApi":["brands","architecture","bitness","fullVersionList","mobile","model","platform","platformVersion"],"GEHomepageSuggestedEditsEnableTopics":true,"wgGETopicsMatchModeEnabled":false,"wgGEStructuredTaskRejectionReasonTextInputE'
========== TRUNCATED ==========
'ticky-header" tabindex="-1" data-event-name="wikitext-edit-sticky-header"><span class="vector-icon mw-ui-icon-wikimedia-wikiText mw-ui-icon-wikimedia-wikimedia-wikiText"></span>\n\n<span></span>\n\t\t\t</a>\n\t\t\t<a href="#" class="cdx-button cdx-button--fake-button cdx-button--fake-button--enabled cdx-button--weight-quiet cdx-button--icon-only" id="ca-ve-edit-sticky-header" tabindex="-1" data-event-name="ve-edit-sticky-header"><span class="vector-icon mw-ui-icon-wikimedia-edit mw-ui-icon-wikimedia-wikimedia-edit"></span>\n\n<span></span>\n\t\t\t</a>\n\t\t\t<a href="#" class="cdx-button cdx-button--fake-button cdx-button--fake-button--enabled cdx-button--weight-quiet cdx-button--icon-only" id="ca-viewsource-sticky-header" tabindex="-1" data-event-name="ve-edit-protected-sticky-header"><span class="vector-icon mw-ui-icon-wikimedia-editLock mw-ui-icon-wikimedia-wikimedia-editLock"></span>\n\n<span></span>\n\t\t\t</a>\n\t\t</div>\n\t\t\t<div class="vector-sticky-header-buttons">\n\t\t\t\t<button class="cdx-button cdx-button--weight-quiet mw-interlanguage-selector" id="p-lang-btn-sticky-header" tabindex="-1" data-event-name="ui.dropdown-p-lang-btn-sticky-header"><span class="vector-icon mw-ui-icon-wikimedia-language mw-ui-icon-wikimedia-wikimedia-language"></span>\n\n<span>Add languages</span>\n\t\t\t</button>\n\t\t\t<a href="#" class="cdx-button cdx-button--fake-button cdx-button--fake-button--enabled cdx-button--weight-quiet cdx-button--action-progressive" id="ca-addsection-sticky-header" tabindex="-1" data-event-name="addsection-sticky-header"><span class="vector-icon mw-ui-icon-speechBubbleAdd-progressive mw-ui-icon-wikimedia-speechBubbleAdd-progressive"></span>\n\n<span>Add topic</span>\n\t\t\t</a>\n\t\t</div>\n\t\t\t<div class="vector-sticky-header-icon-end">\n\t\t\t\t<div class="vector-user-links">\n\t\t\t\t</div>\n\t\t\t</div>\n\t\t</div>\n\t</div>\n</div>\n<div class="vector-settings" id="p-dock-bottom">\n\t<ul></ul>\n</div><script>(RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgHostname":"mw-web.eqiad.canary-668bb8c49f-lkzf5","wgBackendResponseTime":134,"wgPageParseReport":{"limitreport":{"cputime":"0.557","walltime":"0.744","ppvisitednodes":{"value":2973,"limit":1000000},"postexpandincludesize":{"value":80910,"limit":2097152},"templateargumentsize":{"value":10606,"limit":2097152},"expansiondepth":{"value":15,"limit":100},"expensivefunctioncount":{"value":4,"limit":500},"unstrip-depth":{"value":1,"limit":20},"unstrip-size":{"value":102479,"limit":5000000},"entityaccesscount":{"value":1,"limit":400},"timingprofile":["100.00% 690.587 1 -total"," 37.39% 258.197 1 Template:Reflist"," 21.25% 146.753 1 Template:Infobox_person"," 16.93% 116.936 1 Template:Short_description"," 13.91% 96.053 1 Template:Cite_thesis"," 13.59% 93.837 20 Template:Cite_web"," 13.53% 93.430 1 Template:Multiple_issues"," 9.53% 65.797 18 Template:Main_other"," 9.08% 62.699 2 Template:Ambox"," 8.90% 61.478 1 Template:BLP_primary_sources"]},"scribunto":{"limitreport-timeusage":{"value":"0.350","limit":"10.000"},"limitreport-memusage":{"value":7872470,"limit":52428800}},"cachereport":{"origin":"mw-web.eqiad.main-8445b8969b-hqd8d","timestamp":"20241228170122","ttl":2592000,"transientcontent":false}}});});</script>\n<script type="application/ld+json">{"@context":"https:\\/\\/schema.org","@type":"Article","name":"Raven Baxter","url":"https:\\/\\/en.wikipedia.org\\/wiki\\/Raven_Baxter","sameAs":"http:\\/\\/www.wikidata.org\\/entity\\/Q92583026","mainEntity":"http:\\/\\/www.wikidata.org\\/entity\\/Q92583026","author":{"@type":"Organization","name":"Contributors to Wikimedia projects"},"publisher":{"@type":"Organization","name":"Wikimedia Foundation, Inc.","logo":{"@type":"ImageObject","url":"https:\\/\\/www.wikimedia.org\\/static\\/images\\/wmf-hor-googpub.png"}},"dateModified":"2024-11-28T19:25:36Z","image":"https:\\/\\/upload.wikimedia.org\\/wikipedia\\/commons\\/e\\/e2\\/Raven_Baxter_smiling.png","headline":"American science communicator and educator"}</script>\n</body>\n</html>'
========== MIDDLE OF OUTPUT TRUNCATED ==========
= BeautifulSoup(samp.text, 'html.parser')
soup "#bodyContent")[0].find_all('p') soup.select(
[<p><b>Raven Baxter</b> is an American <a href="/wiki/Science_communication" title="Science communication">science communicator</a> and <a href="/wiki/Science,_technology,_engineering,_and_mathematics" title="Science, technology, engineering, and mathematics">STEM</a> educator. Baxter began posting science education videos on social media under the username <b>Raven the Science Maven</b> in 2017.
</p>,
<p>Baxter completed bachelor's and master's degrees in biology at <a class="mw-redirect" href="/wiki/Buffalo_State_College" title="Buffalo State College">Buffalo State College</a> in 2014 and 2016, respectively. For her master's thesis, she studied <a href="/wiki/Protein_folding" title="Protein folding">protein folding</a> in <i><a href="/wiki/Caenorhabditis_elegans" title="Caenorhabditis elegans">Caenorhabditis elegans</a></i>.<sup class="reference" id="cite_ref-1"><a href="#cite_note-1"><span class="cite-bracket">[</span>1<span class="cite-bracket">]</span></a></sup> In 2021, she earned her Ph.D. in curriculum, instruction, and the science of learning from the <a href="/wiki/University_at_Buffalo" title="University at Buffalo">University at Buffalo</a>, and was awarded the SUNY Chancellor's Award for Student Excellence.<sup class="reference" id="cite_ref-:3_2-0"><a href="#cite_note-:3-2"><span class="cite-bracket">[</span>2<span class="cite-bracket">]</span></a></sup> Her dissertation was titled "Culturally responsive Science Communication: The messengers, messages, and voices in communicating science through Hip-Hop".<sup class="reference" id="cite_ref-3"><a href="#cite_note-3"><span class="cite-bracket">[</span>3<span class="cite-bracket">]</span></a></sup>
</p>,
<p>While pursuing her master's degree, Baxter worked as a substitute teacher in <a href="/wiki/Buffalo_Public_Schools" title="Buffalo Public Schools">Buffalo Public Schools</a>.<sup class="reference" id="cite_ref-:3_2-1"><a href="#cite_note-:3-2"><span class="cite-bracket">[</span>2<span class="cite-bracket">]</span></a></sup> She went on to work at <a class="mw-redirect" href="/wiki/AMRI_Global" title="AMRI Global">AMRI Global</a> in drug discovery following her Master's program and recounted her experiences working as a Black woman in science research in a <a href="/wiki/Mother_Jones_(magazine)" title="Mother Jones (magazine)"><i>Mother Jones</i></a> article.<sup class="reference" id="cite_ref-:1_4-0"><a href="#cite_note-:1-4"><span class="cite-bracket">[</span>4<span class="cite-bracket">]</span></a></sup> Baxter also taught as an assistant professor of biology at <a href="/wiki/SUNY_Erie" title="SUNY Erie">Erie Community College</a>, an experience which she says changed her career path.<sup class="reference" id="cite_ref-Our_Stories:_Raven_Baxter_'14,_'16_5-0"><a href="#cite_note-Our_Stories:_Raven_Baxter_'14,_'16-5"><span class="cite-bracket">[</span>5<span class="cite-bracket">]</span></a></sup> While completing her Ph.D. program, Baxter worked as an academic adviser at Buffalo State.
</p>,
<p>Baxter was invited to give the opening keynote at SciComm2020.<sup class="reference" id="cite_ref-6"><a href="#cite_note-6"><span class="cite-bracket">[</span>6<span class="cite-bracket">]</span></a></sup> Following her tweet floating an idea for a <i><a href="/wiki/Dirty_Jobs" title="Dirty Jobs">Dirty Jobs</a></i> spin-off called "Nerdy Jobs", Baxter was contacted by fellow science communicator and <a class="mw-redirect" href="/wiki/Crash_Course_(YouTube)" title="Crash Course (YouTube)">Crash Course</a> host <a href="/wiki/Hank_Green" title="Hank Green">Hank Green</a> with an offer to fund a pilot.<sup class="reference" id="cite_ref-GSE_7-0"><a href="#cite_note-GSE-7"><span class="cite-bracket">[</span>7<span class="cite-bracket">]</span></a></sup><sup class="reference" id="cite_ref-8"><a href="#cite_note-8"><span class="cite-bracket">[</span>8<span class="cite-bracket">]</span></a></sup> Also in 2020, Baxter was recognized as an emerging leader in the field of healthcare, when USA <a href="/wiki/Fortune_(magazine)" title="Fortune (magazine)"><i>Fortune Maga
========== TRUNCATED ==========
</span>17<span class="cite-bracket">]</span></a></sup><sup class="reference" id="cite_ref-:2_18-0"><a href="#cite_note-:2-18"><span class="cite-bracket">[</span>18<span class="cite-bracket">]</span></a></sup><sup class="reference" id="cite_ref-:7_19-0"><a href="#cite_note-:7-19"><span class="cite-bracket">[</span>19<span class="cite-bracket">]</span></a></sup><sup class="reference" id="cite_ref-20"><a href="#cite_note-20"><span class="cite-bracket">[</span>20<span class="cite-bracket">]</span></a></sup>
</p>,
<p>As of 2024, she works <a href="/wiki/Remote_work" title="Remote work">remotely</a> for Mount Sinai Hospital of New York.<sup class="reference" id="cite_ref-:4_21-0"><a href="#cite_note-:4-21"><span class="cite-bracket">[</span>21<span class="cite-bracket">]</span></a></sup>
</p>,
<p>In 2017, Baxter began publishing science education videos under the moniker "Raven the Science Maven."<sup class="reference" id="cite_ref-:2_18-1"><a href="#cite_note-:2-18"><span class="cite-bracket">[</span>18<span class="cite-bracket">]</span></a></sup> Baxter has used her social media platform to counter misinformation around COVID-19 and vaccines. In 2021, she released "Big Ole Geeks," a parody of <a href="/wiki/Megan_Thee_Stallion" title="Megan Thee Stallion">Megan Thee Stallion</a>'s hit "<a href="/wiki/Big_Ole_Freak" title="Big Ole Freak">Big Ole Freak</a>".<sup class="reference" id="cite_ref-Currin_22-0"><a href="#cite_note-Currin-22"><span class="cite-bracket">[</span>22<span class="cite-bracket">]</span></a></sup> "Wipe It Down," Baxter's parody of <a class="mw-redirect" href="/wiki/Lil_Boosie" title="Lil Boosie">Lil Boosie</a>'s "<a href="/wiki/Wipe_Me_Down" title="Wipe Me Down">Wipe Me Down</a>," contained information and safety tips related to the COVID-19 pandemic.<sup class="reference" id="cite_ref-23"><a href="#cite_note-23"><span class="cite-bracket">[</span>23<span class="cite-bracket">]</span></a></sup> Baxter's "The Antibody Song", a parody of <a href="/wiki/Megan_Thee_Stallion" title="Megan Thee Stallion">Megan Thee Stallion</a>'s "<a href="/wiki/Body_(Megan_Thee_Stallion_song)" title="Body (Megan Thee Stallion song)">Body</a>", went viral.<sup class="reference" id="cite_ref-Currin_22-1"><a href="#cite_note-Currin-22"><span class="cite-bracket">[</span>22<span class="cite-bracket">]</span></a></sup><sup class="reference" id="cite_ref-24"><a href="#cite_note-24"><span class="cite-bracket">[</span>24<span class="cite-bracket">]</span></a></sup> In response to one controversial tweet made by <a href="/wiki/Nicki_Minaj" title="Nicki Minaj">Nicki Minaj</a> on side effects of vaccines, Baxter shared a rap song she made on how vaccines work.<sup class="reference" id="cite_ref-25"><a href="#cite_note-25"><span class="cite-bracket">[</span>25<span class="cite-bracket">]</span></a></sup><sup class="reference" id="cite_ref-26"><a href="#cite_note-26"><span class="cite-bracket">[</span>26<span class="cite-bracket">]</span></a></sup>
</p>,
<p>In May 2024, while she was in the process of buying a condo in <a href="/wiki/Virginia_Beach,_Virginia" title="Virginia Beach, Virginia">Virginia Beach</a>, Baxter's broker informed her that the seller rescinded the agreement because she learned that Baxter was Black. Baxter shared the news on Twitter, and following advice of commenters, filed <a href="/wiki/Housing_discrimination_in_the_United_States" title="Housing discrimination in the United States">discrimination</a> claims with the Virginia Fair Housing Office and the <a class="mw-redirect" href="/wiki/US_Department_of_Housing_and_Urban_Development" title="US Department of Housing and Urban Development">US Department of Housing and Urban Development</a>. Baxter's broker later said that the seller changed her mind, and <i><a href="/wiki/The_New_York_Times" title="The New York Times">The New York Times</a></i> reported that the sale was set to close later in the summer.<sup class="reference" id="cite_ref-:4_21-1"><a href="#cite_note-:4-21"><span class="cite-bracket">[</span>21<span class="cite-bracket">]</span></a></sup>
</p>]
========== MIDDLE OF OUTPUT TRUNCATED ==========
AI Prompt
Is there a way to extract all headings and paragraphs (h2
, h3
, h4
…) While keeping the order?
AI Response
Yes! We can get all headings and paragraphs in order using a combination of selectors or by finding specific tags. Here’s one way to do it:
= soup.select("#bodyContent")[0]
content = content.find_all(['h2', 'h3', 'h4', 'p'])
text
# Let's look at the first few elements to check
5] text[:
Would you like to try this? We can then process the text further to clean it up if needed. Let me know if you want to see what this returns first.
= soup.select("#bodyContent")[0]
content = content.find_all(['h2', 'h3', 'h4', 'p'])
text
# Let's look at the first few elements to check
5] text[:
[<p><b>Raven Baxter</b> is an American <a href="/wiki/Science_communication" title="Science communication">science communicator</a> and <a href="/wiki/Science,_technology,_engineering,_and_mathematics" title="Science, technology, engineering, and mathematics">STEM</a> educator. Baxter began posting science education videos on social media under the username <b>Raven the Science Maven</b> in 2017.
</p>,
<h2 id="Education">Education</h2>,
<p>Baxter completed bachelor's and master's degrees in biology at <a class="mw-redirect" href="/wiki/Buffalo_State_College" title="Buffalo State College">Buffalo State College</a> in 2014 and 2016, respectively. For her master's thesis, she studied <a href="/wiki/Protein_folding" title="Protein folding">protein folding</a> in <i><a href="/wiki/Caenorhabditis_elegans" title="Caenorhabditis elegans">Caenorhabditis elegans</a></i>.<sup class="reference" id="cite_ref-1"><a href="#cite_note-1"><span class="cite-bracket">[</span>1<span class="cite-bracket">]</span></a></sup> In 2021, she earned her Ph.D. in curriculum, instruction, and the science of learning from the <a href="/wiki/University_at_Buffalo" title="University at Buffalo">University at Buffalo</a>, and was awarded the SUNY Chancellor's Award for Student Excellence.<sup class="reference" id="cite_ref-:3_2-0"><a href="#cite_note-:3-2"><span class="cite-bracket">[</span>2<span class="cite-bracket">]</span></a></sup> Her dissertation was titled "Culturally responsive Science Communication: The messengers, messages, and voices in communicating science through Hip-Hop".<sup class="reference" id="cite_ref-3"><a href="#cite_note-3"><span class="cite-bracket">[</span>3<span class="cite-bracket">]</span></a></sup>
</p>,
<h2 id="Career">Career</h2>,
<p>While pursuing her master's degree, Baxter worked as a substitute teacher in <a href="/wiki/Buffalo_Public_Schools" title="Buffalo Public Schools">Buffalo Public Schools</a>.<sup class="reference" id="cite_ref-:3_2-1"><a href="#cite_note-:3-2"><span class="cite-bracket">[</span>2<span class="cite-bracket">]</span></a></sup> She went on to work at <a class="mw-redirect" href="/wiki/AMRI_Global" title="AMRI Global">AMRI Global</a> in drug discovery following her Master's program and recounted her experiences working as a Black woman in science research in a <a href="/wiki/Mother_Jones_(magazine)" title="Mother Jones (magazine)"><i>Mother Jones</i></a> article.<sup class="reference" id="cite_ref-:1_4-0"><a href="#cite_note-:1-4"><span class="cite-bracket">[</span>4<span class="cite-bracket">]</span></a></sup> Baxter also taught as an assistant professor of biology at <a href="/wiki/SUNY_Erie" title="SUNY Erie">Erie Community College</a>, an experience which she says changed her career path.<sup class="reference" id="cite_ref-Our_Stories:_Raven_Baxter_'14,_'16_5-0"><a href="#cite_note-Our_Stories:_Raven_Baxter_'14,_'16-5"><span class="cite-bracket">[</span>5<span class="cite-bracket">]</span></a></sup> While completing her Ph.D. program, Baxter worked as an academic adviser at Buffalo State.
</p>]
AI Prompt
Ok. Great. What options do I have to convert this to markdown?
AI Response
There are a few approaches we could take to convert this to markdown:
html2text
which handles most conversions automaticallyh2
to ##
, h3
to ###
, etc.[text](url)
Which approach would you prefer to try? If you want to write our own converter, we could start with a simple version and build it up step by step.
Would you like me to explain more about either option?
import html2text
AI Prompt
html2text is going to work
AI Response
Great choice! Let’s try it with one element first to see how it works. Here’s how we can use html2text:
= html2text.HTML2Text()
h = str(text[0]) # Let's try with the first paragraph
sample print(h.handle(sample))
Would you like to try this and see how it looks before we process all elements?
= html2text.HTML2Text()
h = str("".join(L(text)[:5].map(str)))
sample print(h.handle(sample))
**Raven Baxter** is an American [science
communicator](/wiki/Science_communication "Science communication") and
[STEM](/wiki/Science,_technology,_engineering,_and_mathematics "Science,
technology, engineering, and mathematics") educator. Baxter began posting
science education videos on social media under the username **Raven the
Science Maven** in 2017.
## Education
Baxter completed bachelor's and master's degrees in biology at [Buffalo State
College](/wiki/Buffalo_State_College "Buffalo State College") in 2014 and
2016, respectively. For her master's thesis, she studied [protein
folding](/wiki/Protein_folding "Protein folding") in _[Caenorhabditis
elegans](/wiki/Caenorhabditis_elegans "Caenorhabditis elegans")_.[1] In 2021,
she earned her Ph.D. in curriculum, instruction, and the science of learning
from the [University at Buffalo](/wiki/University_at_Buffalo "University at
Buffalo"), and was awarded the SUNY Chancellor's Award for Student
Excellence.[2] Her dissertation was titled "Culturally responsive Science
Communication: The messengers, messages, and voices in communicating science
through Hip-Hop".[3]
## Career
While pursuing her master's degree, Baxter worked as a substitute teacher in
[Buffalo Public Schools](/wiki/Buffalo_Public_Schools "Buffalo Public
Schools").[2] She went on to work at [AMRI Global](/wiki/AMRI_Global "AMRI
Global") in drug discovery following her Master's program and recounted her
experiences working as a Black woman in science research in a [_Mother
Jones_](/wiki/Mother_Jones_\(magazine\) "Mother Jones \(magazine\)")
article.[4] Baxter also taught as an assistant professor of biology at [Erie
Community College](/wiki/SUNY_Erie "SUNY Erie"), an experience which she says
changed her career path.[5] While completing her Ph.D. program, Baxter worked
as an academic adviser at Buffalo State.
def scrape_url(url):
= html2text.HTML2Text()
h = httpx.get(url)
res = BeautifulSoup(res.text, 'html.parser')
soup = soup.select("#bodyContent")[0]
content = L(content.find_all(['h2', 'h3', 'h4', 'p'])).map(str)
text return h.handle("".join(text))
scrape_url(ytl)
'**Raven Baxter** is an American [science\ncommunicator](/wiki/Science_communication "Science communication") and\n[STEM](/wiki/Science,_technology,_engineering,_and_mathematics "Science,\ntechnology, engineering, and mathematics") educator. Baxter began posting\nscience education videos on social media under the username **Raven the\nScience Maven** in 2017.\n\n## Education\n\nBaxter completed bachelor\'s and master\'s degrees in biology at [Buffalo State\nCollege](/wiki/Buffalo_State_College "Buffalo State College") in 2014 and\n2016, respectively. For her master\'s thesis, she studied [protein\nfolding](/wiki/Protein_folding "Protein folding") in _[Caenorhabditis\nelegans](/wiki/Caenorhabditis_elegans "Caenorhabditis elegans")_.[1] In 2021,\nshe earned her Ph.D. in curriculum, instruction, and the science of learning\nfrom the [University at Buffalo](/wiki/University_at_Buffalo "University at\nBuffalo"), and was awarded the SUNY Chancellor\'s Award for Student\nExcellence.[2] Her dissertation was titled "Culturally responsive Science\nCommunication: The messengers, messages, and voices in communicating science\nthrough Hip-Hop".[3]\n\n## Career\n\nWhile pursuing her master\'s degree, Baxter worked as a substitute teacher in\n[Buffalo Public Schools](/wiki/Buffalo_Public_Schools "Buffalo Public\nSchools").[2] She went on to work at [AMRI Global](/wiki/AMRI_Global "AMRI\nGlobal") in drug discovery following her Master\'s program and recounted her\nexperiences working as a Black woman in science research in a [_Mother\nJones_](/wiki/Mother_Jones_\\(magazine\\) "Mother Jones \\(magazine\\)")\narticle.[4] Baxter also taught as an assistant professor of biology at [Erie\nCommunity College](/wiki/SUNY_Erie "SUNY Erie"), an experience which she says\nchanged her career path.[5] While completing her Ph.D. program, Baxter worked\nas an academic adviser at Buffalo State.\n\nBaxter was invited to give the opening keynote at SciComm2020.[6] Following\nher tweet floating an idea for a _[Dirty Jobs](/wiki/Dirty_Jobs "Dirty Jobs")_\nspin-off called "Nerdy Jobs", Baxter was contacted by fellow science\ncommunicator and [Crash Course](/wiki/Crash_Course_\\(YouTube\\) "Crash Course\n\\(YouTube\\)") host [Hank Green](/wiki/Hank_Green "Hank Green") with an offer\nto fund a pilot.[7][8] Also in 2020, Baxter was recognized as an emerging\nleader in the field of healthcare, when USA [_Fortune\nMagazine_](/wiki/Fortune_\\(magazine\\) "Fortune \\(magazine\\)") named her to\ntheir "[40 Under 40](/wiki/40_Under_40 "40 Under 40") in Health".[9][10][11]\n\nBaxter launched a STEM-themed apparel line called Smarty Pants Clothing in\nJanuary 2021.[12][13] She presented at the RTI Fellows Program Distinguished\nLecture Series on leadership and communication in 2021.[14] Also in 2021,\nBaxter and fellow University of Buffalo PhD candidate Olivia Geneus organized,\nrespectively, the #BlackInSciCom and #BlackInNano [Twitter](/wiki/Twitter\n"Twitter") hashtags as part of their #BlackInX initiative to increase\nvisibility of Black scientists.[15] She partnered with the National Center for\nScience Education (NCSE) during Earth Week 2021 to create a series of videos\nto educate students about climate change.[16]\n\nIn 2022, Baxter was recognized as an _[Ebony](/wiki/Ebony_\\(magazine\\) "Ebony\n\\(magazine\\)")_ Magazine Power 100 Trailblazer in [STEM](/wiki/STEM "STEM"),\nby [_Forbes Magazine_](/wiki/Forbes "Forbes") among their "[30 Under\n30](/wiki/30_Under_30 "30 Under 30")" in Science for her science\ncommunication, in particular through rapping, and with a Woman of Distinction\nAward from the [American Association of University\nWomen](/wiki/American_Association_of_University_Women "American Association of\nUniversity Women").[17][18][19][20]\n\nAs of 2024, she works [remotely](/wiki/Remote_work "Remote work") for Mount\nSinai Hospital of New York.[21]\n\n## Music\n\nIn 2017, Baxter began publishing science education videos under the moniker\n"Raven the Science Maven."[18] Baxter has used her social media platform to\ncounter misinformation around COVID-19 and vaccines. In 2021, she released\n"Big Ole Geeks," a parody of [Megan Thee Stallion](/wiki/Megan_Thee_Stallion\n"Megan Thee Stallion")\'s hit "[Big Ole Freak](/wiki/Big_Ole_Freak "Big Ole\nFreak")".[22] "Wipe It Down," Baxter\'s parody of [Lil Boosie](/wiki/Lil_Boosie\n"Lil Boosie")\'s "[Wipe Me Down](/wiki/Wipe_Me_Down "Wipe Me Down")," contained\ninformation and safety tips related to the COVID-19 pandemic.[23] Baxter\'s\n"The Antibody Song", a parody of [Megan Thee\nStallion](/wiki/Megan_Thee_Stallion "Megan Thee Stallion")\'s\n"[Body](/wiki/Body_\\(Megan_Thee_Stallion_song\\) "Body \\(Megan Thee Stallion\nsong\\)")", went viral.[22][24] In response to one controversial tweet made by\n[Nicki Minaj](/wiki/Nicki_Minaj "Nicki Minaj") on side effects of vaccines,\nBaxter shared a rap song she made on how vaccines work.[25][26]\n\n## Personal life\n\nIn May 2024, while she was in the process of buying a condo in [Virginia\nBeach](/wiki/Virginia_Beach,_Virginia "Virginia Beach, Virginia"), Baxter\'s\nbroker informed her that the seller rescinded the agreement because she\nlearned that Baxter was Black. Baxter shared the news on Twitter, and\nfollowing advice of commenters, filed\n[discrimination](/wiki/Housing_discrimination_in_the_United_States "Housing\ndiscrimination in the United States") claims with the Virginia Fair Housing\nOffice and the [US Department of Housing and Urban\nDevelopment](/wiki/US_Department_of_Housing_and_Urban_Development "US\nDepartment of Housing and Urban Development"). Baxter\'s broker later said that\nthe seller changed her mind, and _[The New York\nTimes](/wiki/The_New_York_Times "The New York Times")_ reported that the sale\nwas set to close later in the summer.[21]\n\n## Awards\n\n## References\n\n## External links\n\n'
from fastcore.parallel import parallel
= parallel(scrape_url, youtubers.map(itemgetter(1)), threadpool=True) texts
-3:] texts[
(#3) ['**Vsauce** ([/ˈviːsɔːs/](/wiki/Help:IPA/English "Help:IPA/English")) is a\n[YouTube](/wiki/YouTube "YouTube") brand created by educator [Michael\nStevens](/wiki/Michael_Stevens_\\(YouTuber\\) "Michael Stevens\n\\(YouTuber\\)").[3] The channels feature videos on scientific, psychological,\nmathematical, and philosophical topics, as well as [gaming](/wiki/Video_game\n"Video game"), [technology](/wiki/Technology "Technology"), popular culture,\nand other general interest subjects.[4][5]\n\n## History\n\nIn 2009, Michael Stevens was asked by a company to pitch them a show about\nfood, so he teamed up with his friend Justin-superstar from Los Angeles, CA to\ncreate a pilot episode showing them using a hammer to supposedly make a peanut\nbutter and banana sandwich in under a second, titling the proposed show "Food\nSmashers", but the show was never made. The company decided to instead ask him\nto make a YouTube channel about video games and name it "Video Game Nation";\nhe agreed to do it but didn\'t like the name, so he decided to call it "Vsauce"\ninstead.[6]\n\nMichael Stevens came up with the name "Vsauce" in 2010 by generating available\npronounceable ".com" domain names using a website called FakeNameGenerator.com\nuntil he found one he liked.[7] On July 30, 2007, someone created a YouTube\naccount named "Vsauce" but had never used it, and on April 16, 2010, YouTube\ngranted Michael Stevens\'s request to claim the account, as was their policy\nback then; if he weren\'t able to claim it, he might\'ve decided to instead use\nhis first YouTube channel, "pooplicker888".[8] Initially, the channel\'s\nprogramming focused on video games and featured a number of hosts.[9] However,\ncertain segments such as _IMG!_ slowly took over, and Stevens became the sole\nhost.[_[citation needed](/wiki/Wikipedia:Citation_needed "Wikipedia:Citation\nneeded")_] The channel then became a mix of information and online activities,\nand solely educational segments popped up.[_[citation\nneeded](/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")_] The\neducational segments became more popular, and since September 9, 2012, only\nthe educational segment (known as _DOT._) has been presented.[10]\n\nIn December 2010, the Vsauce2 (on December 7) and Vsauce3 (on December 24)\nchannels were created.[11] On July 25, 2012, the WeSauce channel was\ncreated.[12] Vsauce was one of the fastest growing channels during September\n2012.[13] During that month, the main Vsauce channel reached 1 million\nsubscribers.[14]\n\nStevens, in the video "A Defense of Comic Sans", notes that the text font\nAlsina was used by the Vsauce channels because of its close resemblance to the\nhandwriting of Nik Guinta, the creator of the original Vsauce logo.[15] A new\nbranding scheme designed by Natasha Jen was adopted in December 2014 to bring\na "grown-up" feel to the channels. It utilizes the DIN Next Rounded font and\nfluid designs to convey the idea of sauce that the name "Vsauce" implies.[16]\n\n## Channels\n\n### Vsauce\n\nThe Vsauce channel, also known as **Vsauce1** to distinguish it from the brand\nas a whole and the other channels, is presented by the founder, [Michael\nStevens](/wiki/Michael_Stevens_\\(YouTuber\\) "Michael Stevens \\(YouTuber\\)"),\nand features videos about science, mathematics,\n[anthropology](/wiki/Anthropology "Anthropology"), and\n[philosophy](/wiki/Philosophy "Philosophy").[_[citation\nneeded](/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")_] The\nprimary series features Stevens discussing a topic or question in a tangential\nmanner, including various interpretations of the question and related facts\nand observations.[17] In 2013, Stevens stated that he researches on Wikipedia\nand academic papers to find information for his videos.[18] Stevens\ncollaborated with educator and television personality [Adam\nSavage](/wiki/Adam_Savage "Adam Savage") in 2017 for a show titled _Brain\nCandy Live_.[19]\n\nVideos by Vsauce have been featured on online news publications,[20] such as\n_[The Huffington Post](/wiki/The_Huffingto
========== TRUNCATED ==========
** is an American academic who works as a geology professor at\n[Central Washington University](/wiki/Central_Washington_University "Central\nWashington University") in [Ellensburg,\nWashington](/wiki/Ellensburg,_Washington "Ellensburg, Washington"). Outside of\nhis work with the university, he is known for his online videos covering the\ngeology of the [Northwestern United States](/wiki/Northwestern_United_States\n"Northwestern United States") and his series _Nick on the Rocks_.\n\n## Early life and education\n\nZentner was born on July 20, 1962, in the state of [Wisconsin](/wiki/Wisconsin\n"Wisconsin").[1] Inspired during a 1983 trip to the [Pacific\nNorthwest](/wiki/Pacific_Northwest "Pacific Northwest") on break from college,\nZentner decided to study geology.[2] Zentner earned his [Bachelor of\nScience](/wiki/Bachelor_of_Science "Bachelor of Science") from the [University\nof Wisconsin–Madison](/wiki/University_of_Wisconsin%E2%80%93Madison\n"University of Wisconsin–Madison") in 1986 and a [Master of\nScience](/wiki/Master_of_Science "Master of Science") from [Idaho State\nUniversity](/wiki/Idaho_State_University "Idaho State University") in 1989.\n\n## Career\n\nFrom 1989 to 1992 Zentner taught geology at the [Miami\nUniversity](/wiki/Miami_University "Miami University") in [Oxford,\nOhio](/wiki/Oxford,_Ohio "Oxford, Ohio"). After that, he took on his current\nposition teaching geology at [Central Washington\nUniversity](/wiki/Central_Washington_University "Central Washington\nUniversity") in [Ellensburg, Washington](/wiki/Ellensburg,_Washington\n"Ellensburg, Washington").\n\nHe is known for his work in making online videos and public lectures covering\nthe geology of the [Northwestern United\nStates](/wiki/Northwestern_United_States "Northwestern United States").[3]\nStarting in the mid-2000s, he gave lectures to the public about local geology\ntopics, such as one on the [Columbia River Basalt\nGroup](/wiki/Columbia_River_Basalt_Group "Columbia River Basalt Group"). These\nled to the production of a series of shorts he made called _Two Minute\nGeology_. He then produced the series _Nick on the Rocks_ that is aired on\n[KCTS-TV](/wiki/KCTS-TV "KCTS-TV") to the [Seattle metropolitan\narea](/wiki/Seattle_metropolitan_area "Seattle metropolitan area").[4]\n\nDuring the [COVID-19 pandemic](/wiki/COVID-19_pandemic "COVID-19 pandemic")\nZentner created [live streaming](/wiki/Live_streaming "Live streaming")\nlectures on various geology topics in a series called _Nick From Home_. In the\n_Nick From Home_ series he focused on trying to provide an educational\nenvironment for children who might be out of school and to viewers from around\nthe world.[5]\n\n## Personal life\n\nZentner is a practicing Catholic and is married to a science teacher at\n[Ellensburg High School](/wiki/Ellensburg_High_School "Ellensburg High\nSchool"). They have three sons.[6][7]\n\n## Honors and awards\n\nIn 2024, Zentner was recognized by the [Geological Association of\nCanada](/wiki/Geological_Association_of_Canada "Geological Association of\nCanada") with their [E. R. Ward Neale Medal](/wiki/E._R._Ward_Neale_Medal "E.\nR. Ward Neale Medal") for sustained outstanding efforts in sharing Earth\nscience with Canadians. [8] In 2015, Zentner won the James Shea Award\npresented by the [National Association of Geoscience\nTeachers](/wiki/National_Association_of_Geoscience_Teachers "National\nAssociation of Geoscience Teachers") for his work in presenting the geology of\nthe Pacific Northwest to a broad audience.[9] Two episodes of _Nick on the\nRocks_ , "Teanaway Tropics" and "Saddle Mountains Buried in Ash", were granted\n[Northwest Emmy Awards](/wiki/Northwest_Emmy_Awards "Northwest Emmy Awards")\nby the [National Academy of Television Arts and\nSciences](/wiki/National_Academy_of_Television_Arts_and_Sciences "National\nAcademy of Television Arts and Sciences") on June 5, 2021.[10]\n\nIn 2023, he was also presented with the GSA Public Service Award by The\nGeological Society of America.[11]\n\n## Publications\n\n## References\n\n## External links\n\n']
========== MIDDLE OF OUTPUT TRUNCATED ==========
import google.generativeai as genai
import os
=os.environ["GEMINI_API_KEY"])
genai.configure(api_key
= genai.embed_content(
result ="models/text-embedding-004",
model="What is the meaning of life?")
content
print(str(result['embedding']))
[-0.010632273, 0.019375853, 0.020965198, 0.0007706437, -0.061464068, 0.014739741, -0.0022759985, 0.013184195, 0.014464715, 0.022593116, 0.02184836, -0.059616957, 0.06032222, -0.047657482, 0.017848385, -0.10987464, -0.0598155, -0.00479664, -0.043298274, -0.05090505, 0.029398112, 0.011642447, 0.04183789, -0.017999396, 0.011026355, 0.049722955, 0.012025892, 0.007331535, 0.01967245, -0.0025621902, 0.028765293, 0.0068937168, 0.0029231338, -0.0002095079, 0.032031864, 0.02518659, -0.032855466, 0.00758291, -0.00011585959, -0.034515556, -0.066151336, 0.03191643, -0.026680378, 0.017334407, -0.025778342, -0.008119435, -0.002431255, -0.009850676, -0.030725427, 0.08225489, 0.036220998, -0.011677602, -0.048477963, 0.026030278, 0.0027632737, -0.036962725, -0.051528536, -0.027265795, 0.04703419, -0.03285586, -0.015140722, -0.003516825, -0.006665491, -0.024252947, -0.031371485, 0.056986455, -0.02846856, 0.009047717, -0.021733612, 0.01993043, -0.016926913, 0.051008012, -0.022356581, 0.05340387, -0.036262874, 0.038486782, 0.00097307086, 0.0058653215, -0.03454564, 0.038883448, -0.020346535, -0.0015010178, 0.050026324, 0.07690296, 0.04075089, 0.031162778, -0.048467305, -0.031640615, -0.050462708, -0.0020114628, 0.028352365, 0.016939064, -0.032321587, -0.017523259, 0.045018278, 0.005056391, -0.08844299, -0.039214693, 0.032369446, 0.013868324, 0.048252415, 0.012212794, -0.0059761675, -0.055453815, 0.059123088, 0.077673666, 0.012595949, -0.030664278, 0.0019445478, -0.04473188, 0.03904732, -0.045189187, 0.005711123, -0.024350755, 0.006020905, -0.056398984, -0.008473793, 0.026584638, -0.019225147, -0.003090118, 0.02925659, 0.037855238, -0.033372607, 0.027388284, 0.058645032, -0.0034353225, -0.00052528176, -0.061926123, -0.047651615, -0.020240242, 0.037042357, -0.101258375, -0.017224912, 0.031264607, -0.029515961, 0.04070285, 0.08155317, -0.02680439, 0.010762277, -0.068192326, 0.010339065, 0.001237995, 0.025081903, 0.025549553, 0.033473987, -0.011019555, 0.025582748, -0.044487614, -0.02351738, -0.019466395, -0.05739292, -0.023219999, 0.06383781, -0.0032941306, 0.0032277782, 0.014958662, 0.037334923, 0.010649138, 0.07434867, -0.024096856, -0.0051036896, -0.05779452, -0.087558694, 0.005070572, -0.059070442, -0.0075670946, 0.020864079, -0.059642896, -0.017373137, 0.010781379, 0.005737636, 0.01155112, -0.051110126, -0.00469127, 0.003082495, 0.021098692, -0.010646007, -0.0075031, 0.01649139, -0.010034379, 0.03665796, -0.02178521, -0.060414966, -0.0110657895, -0.018490821, -0.038217384, -0.008570785, 0.06764553, 0.045524262, 0.028033433, -0.049999256, 0.038643356, -0.001174409, 0.0071052625, -0.0071540354, -0.03563122, 0.040300176, -0.01187511, -0.020187229, 0.034496624, -0.018076168, -0.025241721, -0.03251734, -0.005546835, 0.01218167, 0.001308468, -0.01956061, -0.016109072, 0.033482637, -0.013107253, -0.04336891, 0.017510926, -0.059141196, -0.023261068, 0.025163788, 0.04890369, 0.076442, -0.0016504959, 0.10172619, -0.015871631, -0.023793343, -0.02358568, 0.036539588, -0.06184051, -0.026249573, 0.006468363, -0.031341415, -0.06234132, -0.049488295, -0.018885756, 0.03395302, -0.006009219, -0.031574816, -0.0054155374, -0.033587996, -0.015623983, 0.013743329, 0.06735172, 0.009166206, -0.027008668, 0.053747576, -0.019794546, -0.004977181, -0.0011775235, 0.055169225, 0.031791825, 0.025199965, 0.080965735, 0.0039748563, -0.08897454, -0.027933061, 0.00045005116, -0.013844743, -0.06260468, -0.046366338, -0.029402703, 0.023191761, -0.01076239, 0.0076124803, -0.020023048, 0.039004155, -0.070678934, -0.07069906, -0.02288811, -0.03803117, -0.05004868, -0.018108511, -0.024550572, 0.040691372, -0.05350907, 0.051243976, -0.0021085127, -0.009738572, -0.008890091, -0.015601183, 0.019753162, 0.0053467727, 0.031590473, 0.0041920035, -0.04371269, 0.067351475, -0.019107815, -0.014121782, 0.009763881, 0.031802285, -0.0069985087, 0.013498973, 0.023104675, 0.0006382101, -0.008508383, 0.03777484, 0.008196443, -0.0025804106, -0.033261176, -0.033644095, -0.0039042186, 0.049756225, 0.03194955, 0.018670397, -0.004185749, 0.01654144, 0.06362886, -0.08167434, 0.004465523, 0.005431271
========== TRUNCATED ==========
9716, 0.07723543, 0.013161921, 0.035233274, 0.013950026, -0.026914261, -0.0012491347, 0.022125386, 0.06322952, 0.026747808, 0.016557682, 0.0026654843, 0.018403852, -0.002208754, -0.0043939324, 0.021411125, -0.0720841, -0.014162335, 0.009017187, 0.009589008, 0.013714266, -0.013205313, 0.055074606, 0.0135510815, -0.009647225, -0.0073859296, -0.015533789, 0.041406598, -0.029964559, -0.004557068, 0.042442538, 0.003949693, -0.060314845, -0.0485521, -0.008145191, -0.0008701478, 0.026269091, 0.064659014, -0.0014519938, 0.07755499, 0.012390666, 0.0009994709, 0.010512895, -0.0278039, -0.007720246, -0.017693883, -0.048093677, 0.048450127, -0.0084898835, 0.033827696, 0.012179157, 0.0439037, 0.019806726, -0.0033815033, 0.055004198, -0.010644163, 0.06983634, -0.0012867257, 0.116212435, 0.0018561919, -0.03540732, 0.018552277, -0.014596015, 0.007995569, 0.02062322, -0.013589375, 0.013323644, 0.058206026, 0.014310659, 0.009776701, 0.022025304, 0.043452848, 0.007224779, -0.005841782, 0.07922995, 0.029124206, 0.027332257, 0.011426645, 0.0610715, 0.033370003, -0.0032318854, 0.032962296, 0.044215627, -0.0019828111, -0.015901793, -0.00029608337, 0.013392526, -0.009583505, 0.101087496, 0.029640157, -0.04264001, 0.028663691, 0.0012885618, -0.00042037942, -0.05097693, 0.046501413, 0.034346417, -0.03722956, 0.030485353, -0.028618095, -0.014943351, 0.024176005, 0.0059531713, -0.035492424, 0.04719729, -0.022705767, -0.004888659, 0.013763481, -0.006877845, 0.039462008, -0.022432147, -0.024738846, -0.0030126623, 0.014878597, 0.047142185, -0.028536918, -0.0019756483, -0.024875728, -0.049604762, 0.0076611727, 0.0125418445, 0.06991834, 0.03057351, -0.0378383, -0.01601651, 0.023397712, -0.006465213, -0.016750913, -0.028563995, 0.013968368, 0.04284747, 0.013723971, -0.038290635, 0.0062841102, -0.016612995, 0.0060477066, 0.0071878443, 0.017012084, 0.026105886, -0.029898316, -0.0034338816, 0.022605129, -0.031070229, -0.014588787, -0.05051719, 0.011172559, -0.009865424, -0.0006022246, -0.050201006, 0.010974502, 0.068753, -0.06411741, 0.021827834, -0.079100326, 0.027182067, -0.0049233013, -0.00854883, 0.042056426, -0.041176684, -0.043345083, 0.007900265, 0.03339074, 0.009065677, -0.11376203, 0.026648033, -0.02173746, -0.056054536, -0.05019967, 0.02505995, -0.073714115, 0.00041753243, 0.046410866, -0.00787225, -0.04326591, 0.052950215, -0.020633917, 0.0053053875, 0.038686555, 0.0076096985, -0.044483498, 0.01734079, 0.050843734, 0.041709274, -0.032848667, 0.06583798, -0.0462481, -0.019906212, 0.062381882, 0.010934914, -0.053675517, -0.033560812, 0.027787214, 0.003391649, 0.019972153, 0.0442223, -0.06779605, -0.057355773, -0.00908375, -0.031183494, 0.07079641, -0.020006215, -0.024294054, -0.016699182, 0.0010443482, 0.018393427, 0.032058917, 0.04007311, -0.013608359, -0.02647255, -0.023104627, 0.07973177, 0.0143912, -0.00773088, -0.0105773965, 0.009673522, 0.030086972, 0.021788783, 0.0215211, -0.0021278693, 0.01382664, -0.05028589, 0.0037969938, -0.019241702, -0.055900373, 0.047401533, 0.047825735, -0.008378417, -0.021368338, -0.0029360335, -0.023776283, -0.030378696, 0.0042622155, -0.04370354, 0.046717755, 0.057218548, -0.07626953, 0.06840914, -0.013551472, -0.04081457, 0.0024602, -0.019596782, -0.034115944, 0.022949563, 0.08198656, 0.010917071, 0.012808682, -0.0024835565, -0.06742202, 0.035741765, -0.007581535, 0.01281636, 0.05919395, 0.019007294, -0.057466842, 0.031478077, 0.011478408, 0.0197156, 0.03522307, -0.0039083306, 0.009473974, -0.061164707, -0.010365365, 0.020075476, 0.025542602, -0.030813247, -0.050739173, -0.0037222796, -0.0025314046, 0.03607207, 0.085864864, 0.030587368, -0.011790973, 0.02897135, 0.009813614, 0.0036375853, 0.01939262, -0.012913535, 0.032164395, -0.012496243, 0.053962503, -0.0030092895, -0.013271072, -0.069150545, -0.014564991, 0.01531648, -0.0493518, -0.026759734, -0.030610656, -0.022655917, -0.09071128, -0.051921394, -0.014159941, 0.086534575, 0.039204597, -0.018607471, -0.023076432, 0.016071219, 0.08200573, 0.036090653, -0.0029250141, 0.032362826, -0.014467054, 0.013964356, -0.075049624, 0.047506943, -0.007153866, -0.028534686]
========== MIDDLE OF OUTPUT TRUNCATED ==========
import numpy as np
def embed_text(text):
= genai.embed_content(
result ="models/text-embedding-004",
model=text[:9500])
contentreturn np.array(result['embedding'])
"The meaning of life is 42") embed_text(
array([-0.01066234, 0.00649511, 0.01235603, 0.01205517, -0.0474491 ,
0.00051895, 0.02142454, 0.01516305, -0.0048512 , 0.02098505,
0.02664368, -0.01584298, 0.05441088, -0.04323841, -0.02995334,
-0.0704907 , -0.01969789, 0.03726413, -0.02791253, -0.0582253 ,
0.02957079, 0.05066217, 0.07097035, -0.0083284 , 0.00778407,
0.06187566, -0.01358336, 0.01961493, 0.00974911, 0.00450947,
0.02318628, 0.03898293, -0.0090694 , 0.00490615, 0.07411326,
0.01140326, -0.01768638, 0.00528801, 0.02198305, 0.00941868,
-0.05691584, 0.00973079, -0.02496212, 0.0048148 , -0.02069672,
-0.03022379, -0.00198286, -0.03914137, -0.01748768, 0.09137583,
0.03588301, 0.00550499, -0.07065818, 0.03073429, -0.04897933,
-0.00801184, -0.03933764, -0.04263175, 0.0783417 , -0.0433627 ,
0.00288811, -0.01116996, -0.01827974, 0.00901582, -0.03189057,
0.02169695, -0.01463962, 0.03672362, 0.0041266 , 0.00690016,
-0.0103481 , 0.02301461, 0.01767732, 0.03779077, -0.06550777,
0.02236628, -0.01623123, -0.01787173, 0.00203618, 0.02096132,
0.00815322, -0.02832469, 0.03354488, 0.07294329, 0.00121085,
0.0703508 , -0.00574637, -0.0439808 , -0.00495169, -0.00196779,
0.0939461 , 0.01446138, -0.02951818, -0.00628469, 0.01307793,
-0.03478021, -0.11100648, -0.05723628, 0.04780736, -0.02403966,
0.05157083, 0.01347932, -0.03682614, -0.0338623 , 0.05822274,
0.04330651, 0.03924662, -0.02043302, 0.00280099, -0.01551484,
0.02340735, -0.03665346, -0.00917959, -0.02075773, 0.01056342,
-0.09254858, 0.01329131, 0.01878193, -0.01624217, 0.00851024,
-0.0134598 , 0.0068792 , -0.02216118, 0.00367308, 0.01074062,
0.01525182, -0.0097695 , -0.03239844, -0.03712623, -0.05969174,
0.0155193 , -0.10119373, -0.02892 , 0.01941379, 0.00111912,
-0.02251512, 0.06521513, 0.01381673, 0.01890122, -0.03968036,
0.01933701, 0.02716191, 0.03998325, 0.0322112 , 0.0203405 ,
0.03691105, -0.00097552, -0.0327741 , -0.10223188, -0.00537822,
-0.07539986, -0.03369539, 0.01673665, 0.03149581, 0.00905198,
0.00807539, 0.05683875, 0.03658272, 0.11745308, -0.01853267,
0.00750608, -0.01646589, -0.08207792, 0.01134697, -0.0638885 ,
-0.01630133, 0.00900568, -0.05708649, -0.0521975 , 0.00703134,
-0.01318072, 0.00825227, -0.02979136, 0.01360853, 0.00584737,
0.0243837 , 0.00258969, -0.0361873 , 0.04945159, -0.01167565,
0.01846446, -0.03142447, -0.05363965, -0.01983421, 0.02373881,
-0.04122766, -0.02960007, 0.04551102, 0.02306707, -0.00475425,
-0.04605725, 0.01623604, 0.03261642, 0.03704014, 0.00869008,
-0.00820675, 0.03716049, 0.02327477, 0.01991879, -0.05258761,
-0.02677231, 0.0088443 , -0.0112439 , 0.00086015, 0.01316499,
0.04445305, -0.02489544, -0.02651665, -0.00244256, -0.03510289,
-0.0175469 , -0.00925536, -0.03109119, -0.01394687, 0.02223985,
0.0207759 , 0.09130119, 0.0116418 , 0.09091861, -0.01842295,
0.00119587, -0.04696823, 0.02776311, -0.07228125, -0.02116251,
-0.04907659, 0.01037783, -0.05684086, -0.03867301, -0.03407598,
0.05129354, 0.05276806, -0.02674742, 0.01036823, -0.02664657,
0.05320844, 0.01286431, 0.04077167, -0.03076734, -0.04351043,
0.04301521, -0.00140677, 0.03151304, 0.04044873, 0.0386445 ,
-0.01268507, 0.03008828, 0.05208877, 0.00516044, -0.06577354,
0.00181048, -0.02921878, 0.0041555 , -0.05463991, -0.01272646,
-0.03125353, 0.01916531, 0.00271207, -0.01107735, -0.02054999,
0.04769764, -0.04008096, -0.0467182 , -0.03984009, -0.04304045,
-0.07418895, -0.02243103, -0.03515374, 0.0512816 , -0.02358332,
0.03906413, 0.01132263, 0.01973892, 0.00964091, -0.02265252,
0.0470815 , -0.01696666, 0.01934411, 0.00755973, -0.04179061,
0.03194218, -0.01696544, -0.00875803, -0.00553918, 0.02
========== TRUNCATED ==========
982, -0.00845955,
-0.03635563, 0.00148993, -0.00051703, 0.03646368, -0.02781074,
-0.00999179, 0.03050667, 0.02318335, -0.05792278, -0.04298384,
0.00921291, -0.01167271, 0.00742153, 0.0219356 , 0.0289583 ,
0.08241901, 0.03777817, -0.00434055, 0.00198803, -0.026471 ,
-0.00832859, -0.03299995, -0.04740943, 0.07414803, 0.01364307,
0.05461668, 0.00411351, -0.00097527, -0.00945516, -0.02044345,
0.06331658, 0.02114198, 0.06214599, 0.01468607, 0.05835672,
-0.02212983, -0.01478219, -0.00512772, 0.0221267 , 0.00359881,
0.00178185, 0.05221689, 0.03138004, 0.03468319, -0.00641493,
0.0436512 , 0.07051783, 0.03119113, -0.00159584, 0.02190734,
0.08036541, 0.02325977, 0.02496185, -0.00244789, 0.05720839,
0.01399083, -0.01001461, 0.04607579, 0.03304532, 0.03272144,
-0.03336343, -0.03504922, 0.00215493, -0.03871704, 0.05933194,
0.01298846, -0.03582428, 0.03868956, -0.0388274 , 0.00519893,
-0.08264387, 0.03500005, 0.00016767, -0.01000023, 0.05363008,
-0.02901698, -0.00813454, 0.04870633, 0.02673289, -0.02734556,
0.03594322, -0.01359646, 0.02896108, -0.0276386 , 0.01346649,
0.02299187, -0.00173576, 0.01537008, -0.01870771, -0.00499871,
0.07017144, -0.05334425, 0.00287908, -0.02309381, -0.06172037,
-0.01746261, -0.01030003, 0.05113179, -0.00115192, -0.03700791,
-0.02896037, 0.03522443, -0.00180419, -0.00184205, -0.03702955,
-0.01443476, 0.0078037 , 0.0034288 , -0.06840809, 0.05675406,
0.01508879, 0.01307093, 0.04438046, 0.00889499, 0.02422069,
-0.00987706, 0.01053525, 0.01186056, 0.03798896, -0.04700388,
-0.06111705, 0.03706404, -0.00427407, 0.02122096, -0.01901318,
0.02142526, 0.07530528, -0.0496202 , 0.01331702, -0.05692093,
0.04545003, -0.01683382, -0.02170812, -0.01932105, -0.04957866,
-0.0505583 , 0.0164365 , 0.01326067, 0.0061449 , -0.09097937,
-0.0121119 , -0.04476158, -0.05925216, -0.05523654, 0.01582555,
-0.03313805, -0.0198373 , 0.05379492, 0.00820372, -0.03119072,
0.01909389, 0.02239364, -0.00200641, 0.04361863, -0.00207676,
-0.00749934, -0.0041842 , -0.01269773, 0.05944678, 0.0066983 ,
0.05811807, 0.00525285, 0.02762363, 0.03192301, -0.01203535,
-0.07445858, -0.01102264, 0.04732837, 0.01329157, 0.00128074,
0.01678163, -0.07250436, -0.07295162, 0.0073674 , -0.0202417 ,
0.03707634, 0.00738399, -0.00942658, -0.01560923, -0.01318206,
0.04060544, 0.05683538, 0.03333741, -0.02638517, 0.00380254,
0.02432853, 0.05580147, 0.01278528, 0.03003754, -0.00867692,
0.05004227, 0.02559797, 0.0321243 , 0.02279346, -0.02486296,
0.02568569, -0.02928683, 0.03360063, -0.02174229, -0.01210525,
0.05919435, 0.04103131, -0.02075456, -0.02643364, -0.00504551,
-0.02761779, -0.01140417, -0.00702377, -0.03156221, 0.06126244,
0.04225057, -0.04481943, 0.08200835, -0.01584544, -0.02495095,
0.01101061, -0.03330567, -0.07213164, 0.04542463, 0.11594744,
-0.02860753, 0.00809318, 0.01383811, -0.0972391 , 0.02560593,
-0.04892399, 0.03850576, 0.07564212, -0.01678093, -0.02191187,
-0.00743737, 0.00955193, 0.00884028, -0.00156516, 0.00889966,
0.0460765 , -0.0289285 , -0.00032377, 0.05041541, 0.00584444,
0.02311952, -0.04427986, -0.01888898, -0.02964494, 0.02779047,
0.07964363, 0.03185183, -0.00083487, 0.01936913, 0.00863814,
0.03299349, 0.00393248, -0.02917618, -0.02495231, -0.05608927,
0.05404864, -0.00712999, -0.02409049, -0.05218818, -0.03817146,
0.02152994, 0.00771041, -0.01181701, -0.01896061, -0.00198725,
-0.09289269, -0.03691207, -0.00562599, 0.05918521, 0.05110494,
0.00528303, -0.00696221, 0.00337722, 0.08577797, 0.04613644,
-0.02564202, 0.01972815, 0.01215957, -0.01496509, -0.06977812,
0.04159616, 0.0209071 , -0.06396391])
========== MIDDLE OF OUTPUT TRUNCATED ==========
= parallel(embed_text, texts, threadpool=True)
embeds len(embeds), embeds[:3]
(64,
(#3) [array([-0.03438961, 0.01376416, -0.0229827 , 0.0183366 , 0.02228629,
0.01466555, 0.03573985, 0.06705088, -0.00618027, -0.06189201,
-0.01178603, 0.04462393, 0.06583346, -0.02263058, 0.01458942,
-0.02599869, 0.05834971, 0.03745085, -0.07572556, -0.03084308,
-0.02905844, -0.00374075, -0.02042524, -0.0305319 , -0.01648149,
-0.01391998, 0.01272065, -0.05341325, -0.02423089, -0.02863775,
0.06487277, 0.05500007, -0.00476869, -0.01873766, -0.00449867,
0.06518174, 0.01763594, -0.03446 , 0.05231439, -0.03587128,
-0.00354947, -0.00219901, -0.0287749 , 0.06773719, -0.0159725 ,
-0.03601294, 0.02140375, 0.01607889, 0.02069261, 0.08740482,
0.01640517, 0.02489277, -0.02153742, -0.00737306, 0.01047441,
-0.04618495, -0.06250185, -0.03736583, 0.01965015, -0.03923173,
-0.01037282, -0.01241264, 0.01406575, -0.04610987, -0.04065523,
-0.01717106, -0.02018995, 0.03492987, -0.0743814 , -0.00380956,
-0.06583589, -0.01438313, 0.0078384 , 0.07265249, -0.01193397,
-0.01805537, -0.01178879, -0.06689676, 0.0166575 , 0.05214408,
0.00228817, 0.01232103, 0.04633781, 0.06287591, -0.00322682,
0.0313022 , 0.01861079, -0.01174364, -0.04154622, -0.03587855,
0.07460625, 0.05095064, 0.00032766, -0.00420478, 0.05874553,
-0.03124782, -0.0707054 , -0.08375132, 0.05833234, 0.06223689,
-0.00081884, 0.04742179, -0.04985566, -0.03204105, 0.05354171,
0.04405283, 0.02736747, -0.05571955, -0.09391017, 0.03751054,
-0.02812691, -0.05250667, -0.02744633, -0.03274956, -0.0171396 ,
-0.01991734, 0.01427318, 0.04325527, -0.06786986, -0.03272281,
-0.01013604, 0.02080087, 0.00306116, 0.02189979, 0.03947447,
0.03110968, -0.00252711, -0.03821916, -0.0273129 , -0.02375894,
0.06539963, -0.01668973, 0.0160673 , 0.04857484, -0.03126547,
-0.05761808, -0.00436789, 0.00223868, 0.048706 , 0.03133341,
-0.02122619, 0.01098978, -0.03450914, 0.0001893 , 0.02383177,
-0.02154905, 0.0144111 , 0.01917201, 0.02288848, 0.02846432,
-0.09119853, -0.0233041 , -0.01706085, -0.02674026, -0.01868854,
-0.02175049, 0.06059362, -0.03888417, -0.00020984, -0.01652347,
0.01938781, -0.05607275, -0.02979115, 0.05011627, -0.02624307,
-0.05918247, -0.01578373, -0.01156033, -0.00195171, 0.01819323,
-0.04559212, -0.06527875, -0.03242423, -0.07885612, 0.01203717,
0.01681829, -0.01467077, -0.02312288, -0.01957101, -0.02668103,
0.01001546, -0.01515027, 0.00743221, -0.05563166, -0.01089877,
-0.03745047, -0.03977936, 0.00452985, 0.01478483, 0.02100305,
-0.0179971 , 0.00507805, 0.02854546, 0.02176178, 0.01276266,
0.0368208 , 0.00931385, -0.04421101, -0.04197725, 0.02469226,
-0.03336874, -0.00492432, 0.01780133, -0.05405867, 0.0082892 ,
-0.01504897, 0.03246582, -0.02957549, -0.0207995 , -0.01120136,
-0.06495646, -0.06833863, -0.12648799, -0.02487806, 0.00464388,
-0.01497041, 0.07351642, -0.03599031, 0.08379254, -0.05051957,
0.01935412, 0.02558433, -0.0403911 , 0.01100834, -0.00306207,
0.02134613, -0.05274787, 0.01056968, 0.01403791, -0.08291902,
0.00861731, -0.00030759, 0.03689594, 0.06250603, -0.01911991,
0.04184673, -0.02830807, -0.04406155, 0.02583458, 0.03477229,
0.00994359, -0.04881102, 0.03760479, 0.02172023, -0.01870876,
-0.03360685, 0.03072836, 0.03091322, 0.00372389, -0.01256322,
-0.02737044, 0.00429793, 0.06002035, -0.04769883, -0.07923073,
0.03464031, 0.03702 , -0.00979071, -0.01989519, -0.01285591,
0.0955291 , -0.04490273, 0.00798205, -0.05211566, -0.06736732,
-0.08159424, 0.01686901, -0.00085053, 0.05184048, -0.05950512,
0.1317129 , 0.01241064, -0.06237805, -0.02156961, -0.04167266,
0.07819574, 0.00699196, 0.00277938, 0.00247281, -0.02703607,
0.0331498 , -0.00539436, 0.00627526, -0.023
========== TRUNCATED ==========
1020e-02, 1.40588920e-02, -6.53653300e-03,
1.25746670e-02, -3.90230340e-02, 4.65403720e-02, -5.13485500e-02,
1.27229460e-02, -6.38395600e-03, -1.76437590e-02, -5.63406900e-02,
-1.11660230e-02, -2.30192020e-02, 2.89698040e-04, 4.44025730e-02,
2.43364130e-03, 1.40325045e-02, -6.13947350e-03, 4.88702020e-02,
5.24114000e-03, -3.28569230e-02, -2.84263860e-02, -1.46273790e-02,
1.74558330e-02, -2.84307410e-02, -2.58491640e-02, -2.98883230e-02,
-3.84924080e-02, 4.38874500e-02, 3.05230230e-02, 3.55169770e-02,
7.42339250e-03, -9.35011400e-03, -1.49328700e-02, 6.60258000e-03,
-1.31959880e-02, -2.14049820e-02, 2.68528090e-02, -1.70763560e-02,
2.94285160e-02, -1.50533490e-02, -1.12938685e-02, -1.21730520e-02,
3.76128170e-03, -2.11018580e-02, 1.32269940e-02, -4.44707300e-02,
4.38102970e-02, 3.00757050e-02, -1.24823610e-02, 6.59967400e-02,
5.18502220e-02, -1.26845410e-02, -2.17233770e-02, 4.10034800e-03,
-3.08165420e-02, -9.26295600e-03, -2.75133360e-02, -4.32588600e-02,
7.93929000e-03, -4.13020480e-02, -6.79986700e-03, 9.13069100e-02,
-1.02675840e-01, -2.84754320e-02, -1.42239850e-02, -1.83187540e-03,
-1.05428560e-01, -2.39331200e-04, -1.65750240e-03, -4.08188950e-03,
5.78075350e-02, -1.57896060e-02, 3.19228600e-02, -1.71639290e-02,
-8.16540800e-04, 2.53012440e-02, 3.14811700e-02, -2.70732830e-02,
-1.69568680e-02, 1.96301940e-02, 4.40029170e-02, 9.31876500e-03,
7.84416700e-02, -6.00697300e-02, -1.82778130e-02, -3.47710850e-02,
-1.05452030e-02, -1.28604520e-02, -2.71411240e-02, -2.15566440e-02,
-1.92329160e-02, 1.46974480e-03, 2.06862000e-02, -8.70770900e-03,
-2.90682600e-02, -4.58378160e-02, 3.08830440e-02, 2.85897300e-02,
-1.58372540e-02, -1.00184810e-02, -3.67985400e-02, 3.47455800e-02,
-6.31067250e-03, 3.43511000e-02, -1.37115180e-02, 3.40406900e-02,
-3.73642200e-02, -3.38883900e-02, -4.40643240e-02, -3.09772360e-02,
3.23279950e-03, -1.43624490e-02, 7.51162600e-03, 5.55103830e-02,
-1.49755670e-02, 1.27280100e-02, -5.23040560e-02, 1.71478560e-03,
3.05386610e-02, 1.49264470e-02, 2.69114380e-02, 8.11742800e-03,
-2.94480210e-02, 6.83134100e-03, 2.01181790e-03, -5.26886770e-02,
2.66987330e-02, -3.76088600e-02, -1.69323570e-02, -2.58448720e-03,
4.89683120e-03, 8.08552800e-03, -1.70372300e-02, -6.57280700e-03,
4.51332330e-02, 3.69065370e-02, 4.47080500e-03, -2.52920370e-02,
4.29473600e-02, 1.65979430e-02, -1.30548235e-02, 6.80780640e-03,
1.47932170e-03, 2.76884940e-02, -9.44614240e-04, 6.30092200e-02,
1.70262360e-02, 2.55588140e-02, -4.18891350e-02, 1.44476870e-02,
7.90210500e-03, 1.25612460e-02, -3.62051950e-02, -4.07939930e-02,
-1.94454920e-02, 1.07582010e-02, 4.38561500e-02, 4.26009630e-02,
4.43985900e-03, 3.89296100e-02, -5.87439840e-02, 3.58059630e-02,
3.08775450e-02, -3.00631820e-02, -2.02198830e-02, 7.85949900e-02,
-1.54788230e-02, -2.35625080e-02, 1.31439010e-02, -1.67748630e-02,
-4.81113140e-03, -8.51147900e-02, 1.26596720e-02, 3.21238340e-02,
1.26274420e-02, 1.21549900e-03, -7.53358000e-02, -1.25882970e-02,
-5.45168070e-02, 7.01173200e-02, 5.64538580e-02, -5.15369900e-03,
1.38941420e-03, 2.44632000e-02, -6.88876700e-02, 3.48396900e-02,
7.76111940e-03, 2.30828640e-02, -7.65153350e-03, 3.59238160e-02,
4.52233250e-02, -3.09568590e-02, -2.28458300e-02, -3.83441670e-02,
3.11491790e-02, 3.65115900e-02, 3.48803590e-03, -4.37090260e-04,
-3.95407270e-02, -3.76835580e-02, -8.24146600e-02, 1.97924900e-02,
3.16193720e-03, 5.38925600e-03, -2.48050450e-02, -1.46615990e-02,
3.70938070e-02, 2.43451680e-03, 7.01363700e-03, 5.85194340e-02,
1.82015520e-02, 6.92989400e-02, -1.41832070e-02, 3.30765140e-02,
-8.00470200e-02, -5.73363000e-03, 5.99778800e-02, -2.63178870e-02])])
========== MIDDLE OF OUTPUT TRUNCATED ==========
AI Prompt
I would like to do some dimensionality reduction using both tsne and umap and then visualise the result. Remind me how to do it
AI Response
I’ll help you with both UMAP and t-SNE visualization. Let’s do this step by step. First, we’ll need to import the required libraries. Would you like to start with UMAP or t-SNE?
We’ll need:
from sklearn.manifold import TSNE
import umap
import matplotlib.pyplot as plt
Which method would you like to try first? Also, would you like me to help you set up the plotting functions, or would you prefer to write those yourself?
AI Prompt
Let’s start TSNE
AI Response
OK, let’s start with t-SNE. First, we need to convert our embeddings into a format suitable for t-SNE. Here’s how we can stack them into a numpy array:
= np.stack(embeds) X
Would you like me to help you set up the t-SNE model with some good default parameters? We’ll need to decide on: - perplexity (usually between 5-50) - number of components (usually 2 for visualization) - number of iterations
Would you like suggestions for these parameters?
AI Prompt
explain me the effect of perplexity and number of iterations
AI Response
Let me explain these parameters:
Perplexity: - It’s like a balance between local and global structure in your data - Lower values (5-10) focus more on local groups, showing tight clusters but might miss bigger patterns - Higher values (30-50) preserve more global structure but might blur local details - Rule of thumb: try perplexity = number_of_points/3, but not more than 50
Number of iterations: - Controls how long t-SNE tries to optimize the visualization - Too few iterations might not find a good layout (under-optimized) - Too many iterations take longer but might not improve results much - Usually 1000-2000 is good, but complex data might need 3000-5000
Would you like to try different values to see their effects on our data? We could start with a medium perplexity and see how it looks?
= np.stack(embeds) X
from sklearn.manifold import TSNE
import umap
import matplotlib.pyplot as plt
= TSNE(perplexity=20, n_iter=3000, n_components=2)
tsne = tsne.fit_transform(X)
ty ty
/app/data/.local/lib/python3.11/site-packages/sklearn/manifold/_t_sne.py:1164: FutureWarning: 'n_iter' was renamed to 'max_iter' in version 1.5 and will be removed in 1.7.
warnings.warn(
array([[ 2.0962691 , 3.2623792 ],
[-0.01474171, -0.78255713],
[ 7.173395 , -4.4393864 ],
[ 9.657282 , -2.0129142 ],
[ 7.7364535 , 0.56770664],
[ 8.838909 , -0.9247257 ],
[ 1.7379007 , -4.4938154 ],
[11.327545 , -0.06525223],
[ 3.4592927 , 4.8374286 ],
[ 4.440558 , -0.24623805],
[ 4.509141 , 5.0424767 ],
[ 8.363378 , -3.5562203 ],
[ 1.0006253 , 2.3865294 ],
[ 1.2940748 , 1.3286926 ],
[ 0.3420515 , -2.732964 ],
[ 2.0268888 , -0.84108603],
[ 3.785712 , -3.0157082 ],
[-1.4457676 , 1.6808391 ],
[ 2.320461 , 1.3557605 ],
[ 3.7793667 , 1.5371794 ],
[ 3.7221498 , 1.5720315 ],
[ 3.4832294 , 1.9030731 ],
[ 5.5411315 , -0.96120876],
[ 5.2471104 , -5.830942 ],
[ 7.2668247 , -2.8886786 ],
[ 5.9599705 , -1.518288 ],
[ 5.1840568 , 2.0868518 ],
[ 3.2136133 , -1.1554371 ],
[ 0.20817119, -2.2902033 ],
[ 5.049799 , 0.8516518 ],
[ 0.67488366, 4.21842 ],
[ 0.85971594, 3.0906482 ],
[ 9.264497 , 3.5407863 ],
[ 6.1810956 , 5.3305764 ],
[ 7.2909226 , -0.72534674],
[ 0.21178165, -0.04114036],
[ 2.1886728 , 6.018978 ],
[ 2.469293 , -0.5972144 ],
[ 3.2227914 , 4.956482 ],
[ 4.090886 , -1.055039 ],
[ 7.9561963 , -1.8239459 ],
[ 6.1233964 , -2.8844044 ],
[10.451141 , 2.1489723 ],
[ 6.452715 , 2.1963952 ],
[ 5.540473 , -2.2655385 ],
[ 8.066526 , 4.5633774 ],
[ 8.4469385 , 2.282895 ],
[ 7.144918 , 3.1234853 ],
[ 7.0068707 , -0.94445944],
[ 1.7166494 , -2.3697484 ],
[ 7.0749736 , 3.0573676 ],
[ 5.2974505 , 0.5715443 ],
[ 4.86757 , -1.9088445 ],
[ 9.252375 , 0.54759043],
[ 6.8004565 , 1.2902875 ],
[ 4.9385285 , -3.6258545 ],
[ 4.8979983 , -5.824331 ],
[ 2.0671992 , 6.359227 ],
[ 2.1149201 , -4.270968 ],
[ 6.1253514 , 4.18356 ],
[ 4.491753 , 6.8102603 ],
[ 7.00735 , 1.1310642 ],
[ 7.628216 , 2.0558517 ],
[ 4.979117 , 3.2210119 ]], dtype=float32)
AI Prompt
Let’s do a visualisation
AI Response
I’ll help you create a scatter plot of the t-SNE results. We can start with a basic visualization and then enhance it with labels if you’d like. Here’s a simple plot:
=(10, 8))
plt.figure(figsize0], ty[:, 1])
plt.scatter(ty[:, 't-SNE visualization of YouTuber embeddings')
plt.title('t-SNE component 1')
plt.xlabel('t-SNE component 2')
plt.ylabel( plt.show()
Would you like to: 1. Add labels for some/all points? 2. Change the colors based on some characteristic? 3. Adjust the figure style or size?
Let me know which enhancements you’d like to try!
=(12, 10)) # Made it a bit bigger for the labels
plt.figure(figsize0], ty[:, 1])
plt.scatter(ty[:,
# Add labels for each point
for i, (name, _) in enumerate(youtubers):
0], ty[i, 1]), xytext=(5, 5), textcoords='offset points', fontsize=8)
plt.annotate(name, (ty[i,
't-SNE visualization of YouTuber embeddings')
plt.title('t-SNE component 1')
plt.xlabel('t-SNE component 2')
plt.ylabel(# Adjust layout to prevent label clipping
plt.tight_layout() plt.show()
AI Prompt
All right, Let’s try umap now
AI Response
OK, let’s create a UMAP visualization. UMAP has a few key parameters we can adjust: - n_neighbors (similar to perplexity in t-SNE) - min_dist (controls how tightly points cluster together) - n_components (we’ll use 2 for visualization)
Here’s a basic UMAP setup:
= umap.UMAP(n_neighbors=15, min_dist=0.1, n_components=2)
reducer = reducer.fit_transform(X) embedding
Would you like me to explain these parameters before we visualize, or shall we plot it and then adjust parameters based on what we see?
AI Prompt
what are good heuristics for picking n_neighbors and min_dist?
AI Response
Here are the key heuristics for these UMAP parameters:
n_neighbors: - Controls the balance between local and global structure - Smaller values (5-15): Focus on local structure, more clusters - Larger values (30-100): Preserve more global structure - Rule of thumb: sqrt(n_samples) is often a good starting point - For our dataset (~64 samples), 15-20 would be reasonable
min_dist: - Controls how tightly points cluster together - Small values (0.0-0.1): Tighter, more compact clusters - Larger values (0.5-0.8): More evenly spaced points - Rule of thumb: - 0.0-0.1 for clustering visualization - 0.3-0.5 for general structure - >0.8 for more uniform distributions
Would you like to try different values to see how they affect our visualization?
= umap.UMAP(n_neighbors=10, min_dist=0.08, n_components=2)
reducer = reducer.fit_transform(X) uy
/app/data/.local/lib/python3.11/site-packages/sklearn/utils/deprecation.py:151: FutureWarning:
'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
=(12, 10))
plt.figure(figsize0], uy[:, 1])
plt.scatter(uy[:,
# Add labels for each point
for i, (name, _) in enumerate(youtubers):
0], uy[i, 1]), xytext=(5, 5), textcoords='offset points', fontsize=8)
plt.annotate(name, (uy[i,
'UMAP visualization of YouTuber embeddings')
plt.title('UMAP component 1')
plt.xlabel('UMAP component 2')
plt.ylabel(# Adjust layout to prevent label clipping
plt.tight_layout() plt.show()
import gaspard as gas
= 'gemini-1.5-flash'
model
= "You are an expert at classifying YouTube channels with concise, but very understandable categories."
sp
= gas.Client(model=model, sp=sp)
cli = "Read the following Wikipedia entry about a youtuber with an educational channel. Classify their kind of video with a main category and a subcategory (if appropriate). Often the channels will cover a broad spectrum, but have a frequent specific focus. ONLY respond with the category/subcategory and focus. For example: Medicine - General [Focus: Mental Health]"
prompt = cli([prompt, f"<entry>{texts[1]}</entry>"])
answer answer
Science - Earth Science [Focus: Geophysics and Paleoclimatology]
from time import sleep
def classify_channel(text):
1.)
sleep(= cli([prompt, f"<entry>{text}</entry>"])
answer return answer.text
= texts.map(classify_channel)
classes classes
(#64) ['Education - Science [Focus: Biology & Science Communication through music]\n','Science - Earth Science [Focus: Geophysics and Paleoclimatology]\n','Aviation - Aircraft Spotting [Focus: Airport Landings]\n','Engineering - Materials Science\n','Science - Neuroscience [Focus: Cognitive processes]\n','Science - Chemistry [Focus: Chemical Reactions and Compounds]\n','Education - Study Tips [Focus: Law School]\n','Education - Language Learning [Focus: Old Norse]\n','Medicine - Public Health [Focus: Healthcare System Critique & Advocacy]\n','Education - Geography/Economics/Logistics/Military [Focus: Aviation and Logistics]\n\nEducation - Trivia/Curiosities [Focus: Little-known facts]\n\nReality TV - Competition [Focus: Travel]\n\nReality TV - Game Show [Focus: Travel/Competition]','Medicine - General [Focus: Public Health & Misinformation]\n','Mathematics - Applied Mathematics [Focus: Complex Systems]\n','Education - Sex Education [Focus: Human Sexuality]\n','Education - Death & Dying [Focus: Funeral Practices & Cultural Attitudes]\n','Education - Language Learning [Focus: English]\n','Science Communication - STEM Outreach [Focus: Space Exploration and Engineering]\n','Education - History/Geography/Politics [Focus: British History, Politics, and Geography]\n','Science - Space Exploration [Focus: Extraterrestrial life and Future Technology]\n','Science - Natural History [Focus: Museum Specimens & Collections]\n','Education - General [Focus: Science and Humanities]\n'...]
AI Prompt
Let’s redo the plot adding the main category as a color, with a legend
AI Response
I’ll help you create a color-coded plot. First, let’s extract the main categories from our classifications and create a color map. Here’s how we can do it:
# Extract main categories (everything before the first '-')
= [c.split('-')[0].strip() for c in classes]
main_cats
# Create a color map for unique categories
= list(set(main_cats))
unique_cats = plt.cm.tab20(np.linspace(0, 1, len(unique_cats)))
colors = dict(zip(unique_cats, colors))
cat_to_color
# Create the plot
=(15, 12))
plt.figure(figsizefor i, (name, cat) in enumerate(zip(youtubers, main_cats)):
0], uy[i, 1], color=cat_to_color[cat], label=cat)
plt.scatter(uy[i, 0], uy[i, 1]), xytext=(5, 5), textcoords='offset points', fontsize=8)
plt.annotate(name, (uy[i,
# Create legend without duplicates
= [plt.scatter([], [], color=cat_to_color[cat], label=cat) for cat in unique_cats]
handles =handles, title="Main Categories")
plt.legend(handles
'UMAP visualization of YouTuber embeddings')
plt.title('UMAP component 1')
plt.xlabel('UMAP component 2')
plt.ylabel(
plt.tight_layout() plt.show()
Would you like to try this? We can then adjust the colors, legend position, or other aspects if needed.
# Extract main categories (everything before the first '-')
= [c.split('-')[0].strip() for c in classes]
main_cats
# Create a color map for unique categories
= list(set(main_cats))
unique_cats = plt.cm.tab20(np.linspace(0, 1, len(unique_cats)))
colors = dict(zip(unique_cats, colors))
cat_to_color
# Create the plot
=(15, 12))
plt.figure(figsizefor i, (name, cat) in enumerate(zip(youtubers, main_cats)):
0], uy[i, 1], color=cat_to_color[cat], label=cat)
plt.scatter(uy[i, 0], (uy[i, 0], uy[i, 1]), xytext=(5, 5), textcoords='offset points', fontsize=8)
plt.annotate(name[
# Create legend without duplicates
= [plt.scatter([], [], color=cat_to_color[cat], label=cat) for cat in unique_cats]
handles =handles, title="Main Categories")
plt.legend(handles
'UMAP visualization of YouTuber embeddings')
plt.title('UMAP component 1')
plt.xlabel('UMAP component 2')
plt.ylabel(
plt.tight_layout() plt.show()
=(15, 12))
plt.figure(figsizefor i, (name, cat) in enumerate(zip(youtubers, main_cats)):
0], ty[i, 1], color=cat_to_color[cat], label=cat)
plt.scatter(ty[i, 0], (ty[i, 0], ty[i, 1]), xytext=(5, 5), textcoords='offset points', fontsize=8)
plt.annotate(name[
# Create legend without duplicates
= [plt.scatter([], [], color=cat_to_color[cat], label=cat) for cat in unique_cats]
handles =handles, title="Main Categories")
plt.legend(handles
't-SNE visualization of YouTuber embeddings')
plt.title('t-SNE component 1')
plt.xlabel('t-SNE component 2')
plt.ylabel(# Adjust layout to prevent label clipping
plt.tight_layout() plt.show()
from time import sleep
def classify_nat(text):
= "Read the following Wikipedia entry about a youtuber with an educational channel. What is the Youtuber nationality?"
prompt 1.)
sleep(= cli([prompt, f"<entry>{text}</entry>", "Respond ONLY with the nationality and nothing else."])
answer return answer.text
= texts.map(classify_nat)
nationalities nationalities
(#64) ['American\n','French\n','British\n','German\n','Australian\n','Canadian\n','British\n','American\n','American\n','American\n','Russian-born American\n','Australian\n','American\n','American\n','British\n','Mexican\n','English\n','American\n','American\n','American\n'...]
map(str.strip).unique() nationalities.
(#18) ['American','French','British','German','Australian','Canadian','Russian-born American','Mexican','English','Algerian','Australian-British','New Zealander','Spanish','Ukrainian-American','Indian','Scottish','Colombian','Moroccan']
AI Prompt
Can we add the nationalities as a shape?
AI Response
Yes! We can use different markers for different nationalities. Here’s how we can modify the plot:
# Create a shape map for nationalities
= list(set(n.strip() for n in nationalities))
unique_nations = ['o', 's', '^', 'D', 'v', '<', '>', 'p', 'h', '8', '*', 'H', 'd', 'P', 'X', '+', 'x', '1']
markers = dict(zip(unique_nations, markers))
nation_to_marker
=(15, 12))
plt.figure(figsizefor i, ((name, _), cat, nat) in enumerate(zip(youtubers, main_cats, nationalities)):
0], uy[i, 1],
plt.scatter(uy[i, =cat_to_color[cat],
color=nation_to_marker[nat.strip()],
marker=100) # Made markers a bit bigger
s0], uy[i, 1]), xytext=(5, 5),
plt.annotate(name, (uy[i, ='offset points', fontsize=8)
textcoords
# Create two legends
= [plt.scatter([], [], color=cat_to_color[cat], label=cat) for cat in unique_cats]
handles_cat = [plt.scatter([], [], color='gray', marker=nation_to_marker[nat],
handles_nat =nat, s=100) for nat in unique_nations]
label
+ handles_nat,
plt.legend(handles_cat *unique_cats, *unique_nations],
[="Categories and Nationalities",
title=(1.05, 1), loc='upper left')
bbox_to_anchor
'UMAP visualization of YouTuber embeddings')
plt.title('UMAP component 1')
plt.xlabel('UMAP component 2')
plt.ylabel(
plt.tight_layout() plt.show()
Would you like to try this? We can adjust the marker sizes, legend position, or other aspects if needed.
# Create a shape map for nationalities
= list(set(n.strip() for n in nationalities))
unique_nations = ['o', 's', '^', 'D', 'v', '<', '>', 'p', 'h', '8', '*', 'H', 'd', 'P', 'X', '+', 'x', '1']
markers = dict(zip(unique_nations, markers))
nation_to_marker
=(15, 12))
plt.figure(figsizefor i, ((name, _), cat, nat) in enumerate(zip(youtubers, main_cats, nationalities)):
0], uy[i, 1],
plt.scatter(uy[i, =cat_to_color[cat],
color=nation_to_marker[nat.strip()],
marker=100) # Made markers a bit bigger
s0], uy[i, 1]), xytext=(5, 5),
plt.annotate(name, (uy[i, ='offset points', fontsize=8)
textcoords
# Create two legends
= [plt.scatter([], [], color=cat_to_color[cat], label=cat) for cat in unique_cats]
handles_cat = [plt.scatter([], [], color='gray', marker=nation_to_marker[nat],
handles_nat =nat, s=100) for nat in unique_nations]
label
+ handles_nat,
plt.legend(handles_cat *unique_cats, *unique_nations],
[="Categories and Nationalities",
title=(1.05, 1), loc='upper left')
bbox_to_anchor
'UMAP visualization of YouTuber embeddings')
plt.title('UMAP component 1')
plt.xlabel('UMAP component 2')
plt.ylabel(
plt.tight_layout() plt.show()
# Create a shape map for nationalities
= list(set(n.strip() for n in nationalities))
unique_nations = ['o', 's', '^', 'D', 'v', '<', '>', 'p', 'h', '8', '*', 'H', 'd', 'P', 'X', '+', 'x', '1']
markers = dict(zip(unique_nations, markers))
nation_to_marker
=(15, 12))
plt.figure(figsizefor i, ((name, _), cat, nat) in enumerate(zip(youtubers, main_cats, nationalities)):
0], ty[i, 1],
plt.scatter(ty[i, =cat_to_color[cat],
color=nation_to_marker[nat.strip()],
marker=100) # Made markers a bit bigger
s0], ty[i, 1]), xytext=(5, 5),
plt.annotate(name, (ty[i, ='offset points', fontsize=8)
textcoords
# Create two legends
= [plt.scatter([], [], color=cat_to_color[cat], label=cat) for cat in unique_cats]
handles_cat = [plt.scatter([], [], color='gray', marker=nation_to_marker[nat],
handles_nat =nat, s=100) for nat in unique_nations]
label
+ handles_nat,
plt.legend(handles_cat *unique_cats, *unique_nations],
[="Categories and Nationalities",
title=(1.05, 1), loc='upper left')
bbox_to_anchor
'tSNE visualization of YouTuber embeddings')
plt.title('tSNE component 1')
plt.xlabel('tSNE component 2')
plt.ylabel(
plt.tight_layout() plt.show()