[{"data":1,"prerenderedAt":1470},["ShallowReactive",2],{"page-\u002Fscaling-python-web-scrapers\u002Fasynchronous-scraping-with-asyncio-and-httpx\u002F":3,"content-navigation":1318},{"id":4,"title":5,"body":6,"description":1311,"extension":1312,"meta":1313,"navigation":131,"path":1314,"seo":1315,"stem":1316,"__hash__":1317},"content\u002Fscaling-python-web-scrapers\u002Fasynchronous-scraping-with-asyncio-and-httpx\u002Findex.md","Asynchronous Scraping with asyncio and HTTPX",{"type":7,"value":8,"toc":1302},"minimark",[9,13,44,49,64,67,81,85,96,622,625,629,641,927,940,944,955,1161,1165,1192,1195,1199,1252,1256,1269,1280,1286,1298],[10,11,5],"h1",{"id":12},"asynchronous-scraping-with-asyncio-and-httpx",[14,15,16,17,21,22,25,26,33,34,38,39,43],"p",{},"Scraping is dominated by waiting. Each request spends most of its time idle — resolving DNS, opening a connection, and waiting for the server to respond. A sequential ",[18,19,20],"code",{},"requests"," loop wastes all of that idle time doing nothing. Asynchronous I\u002FO fixes this by keeping many requests in flight at once within a single thread, often turning an hour-long crawl into minutes. This guide shows how to do it correctly and politely with ",[18,23,24],{},"asyncio"," and ",[27,28,32],"a",{"href":29,"rel":30},"https:\u002F\u002Fwww.python-httpx.org\u002F",[31],"nofollow","HTTPX",". It builds on the request fundamentals in ",[27,35,37],{"href":36},"\u002Fthe-complete-guide-to-python-web-scraping\u002Funderstanding-http-requests-and-responses\u002F","Understanding HTTP Requests and Responses"," and fits into the broader ",[27,40,42],{"href":41},"\u002Fscaling-python-web-scrapers\u002F","Scaling & Deploying Python Web Scrapers"," workflow.",[45,46,48],"h2",{"id":47},"why-async-helps","Why Async Helps",[14,50,51,52,56,57,59,60,63],{},"Scraping is ",[53,54,55],"strong",{},"I\u002FO-bound",": the bottleneck is the network, not the CPU. ",[18,58,24],{}," runs an event loop that, while one request waits for a response, switches to start or continue others. The result is high concurrency with very low overhead — no thread-per-request memory cost and no context-switching penalty. For CPU-bound work like heavy parsing, async does not help; reach for ",[18,61,62],{},"multiprocessing"," there instead.",[65,66],"diagram-sync-vs-async",{},[14,68,69,70,72,73,76,77,80],{},"HTTPX is the natural client for this. It offers a ",[18,71,20],{},"-like API, native ",[18,74,75],{},"async","\u002F",[18,78,79],{},"await"," support, HTTP\u002F2, and connection pooling, making it a drop-in upgrade path from synchronous code.",[45,82,84],{"id":83},"a-basic-async-scraper","A Basic Async Scraper",[14,86,87,88,91,92,95],{},"The pattern: create one ",[18,89,90],{},"AsyncClient",", build a list of coroutines, and run them concurrently with ",[18,93,94],{},"asyncio.gather",".",[97,98,103],"pre",{"className":99,"code":100,"language":101,"meta":102,"style":102},"language-python shiki shiki-themes material-theme-lighter github-light github-dark","import asyncio\nimport httpx\n\nasync def fetch(client: httpx.AsyncClient, url: str) -> str | None:\n    try:\n        response = await client.get(url, timeout=10)\n        response.raise_for_status()\n        return response.text\n    except httpx.HTTPError as exc:\n        print(f\"Failed {url}: {exc}\")\n        return None\n\nasync def scrape(urls: list[str]) -> list[str | None]:\n    async with httpx.AsyncClient(headers={\"User-Agent\": \"Mozilla\u002F5.0\"}) as client:\n        return await asyncio.gather(*(fetch(client, u) for u in urls))\n\nurls = [f\"https:\u002F\u002Fbooks.toscrape.com\u002Fcatalogue\u002Fpage-{i}.html\" for i in range(1, 21)]\nresults = asyncio.run(scrape(urls))\nprint(f\"Fetched {sum(r is not None for r in results)} pages\")\n","python","",[18,104,105,118,126,133,196,204,245,259,273,294,333,341,346,390,441,492,497,546,572],{"__ignoreMap":102},[106,107,110,114],"span",{"class":108,"line":109},"line",1,[106,111,113],{"class":112},"sVHd0","import",[106,115,117],{"class":116},"su5hD"," asyncio\n",[106,119,121,123],{"class":108,"line":120},2,[106,122,113],{"class":112},[106,124,125],{"class":116}," httpx\n",[106,127,129],{"class":108,"line":128},3,[106,130,132],{"emptyLinePlaceholder":131},true,"\n",[106,134,136,139,142,146,150,154,157,160,162,165,168,171,173,177,180,183,185,189,193],{"class":108,"line":135},4,[106,137,75],{"class":138},"sbsja",[106,140,141],{"class":138}," def",[106,143,145],{"class":144},"sGLFI"," fetch",[106,147,149],{"class":148},"sP7_E","(",[106,151,153],{"class":152},"sFwrP","client",[106,155,156],{"class":148},":",[106,158,159],{"class":116}," httpx",[106,161,95],{"class":148},[106,163,90],{"class":164},"skxfh",[106,166,167],{"class":148},",",[106,169,170],{"class":152}," url",[106,172,156],{"class":148},[106,174,176],{"class":175},"sZMiF"," str",[106,178,179],{"class":148},")",[106,181,182],{"class":148}," ->",[106,184,176],{"class":175},[106,186,188],{"class":187},"smGrS"," |",[106,190,192],{"class":191},"s39Yj"," None",[106,194,195],{"class":148},":\n",[106,197,199,202],{"class":108,"line":198},5,[106,200,201],{"class":112},"    try",[106,203,195],{"class":148},[106,205,207,210,213,216,219,221,225,227,230,232,236,238,242],{"class":108,"line":206},6,[106,208,209],{"class":116},"        response ",[106,211,212],{"class":187},"=",[106,214,215],{"class":112}," await",[106,217,218],{"class":116}," client",[106,220,95],{"class":148},[106,222,224],{"class":223},"slqww","get",[106,226,149],{"class":148},[106,228,229],{"class":223},"url",[106,231,167],{"class":148},[106,233,235],{"class":234},"s99_P"," timeout",[106,237,212],{"class":187},[106,239,241],{"class":240},"srdBf","10",[106,243,244],{"class":148},")\n",[106,246,248,251,253,256],{"class":108,"line":247},7,[106,249,250],{"class":116},"        response",[106,252,95],{"class":148},[106,254,255],{"class":223},"raise_for_status",[106,257,258],{"class":148},"()\n",[106,260,262,265,268,270],{"class":108,"line":261},8,[106,263,264],{"class":112},"        return",[106,266,267],{"class":116}," response",[106,269,95],{"class":148},[106,271,272],{"class":164},"text\n",[106,274,276,279,281,283,286,289,292],{"class":108,"line":275},9,[106,277,278],{"class":112},"    except",[106,280,159],{"class":116},[106,282,95],{"class":148},[106,284,285],{"class":164},"HTTPError",[106,287,288],{"class":112}," as",[106,290,291],{"class":116}," exc",[106,293,195],{"class":148},[106,295,297,301,303,306,310,313,315,318,321,323,326,328,331],{"class":108,"line":296},10,[106,298,300],{"class":299},"sptTA","        print",[106,302,149],{"class":148},[106,304,305],{"class":138},"f",[106,307,309],{"class":308},"s_sjI","\"Failed ",[106,311,312],{"class":240},"{",[106,314,229],{"class":223},[106,316,317],{"class":240},"}",[106,319,320],{"class":308},": ",[106,322,312],{"class":240},[106,324,325],{"class":223},"exc",[106,327,317],{"class":240},[106,329,330],{"class":308},"\"",[106,332,244],{"class":148},[106,334,336,338],{"class":108,"line":335},11,[106,337,264],{"class":112},[106,339,340],{"class":191}," None\n",[106,342,344],{"class":108,"line":343},12,[106,345,132],{"emptyLinePlaceholder":131},[106,347,349,351,353,356,358,361,363,366,369,372,375,377,379,381,383,385,387],{"class":108,"line":348},13,[106,350,75],{"class":138},[106,352,141],{"class":138},[106,354,355],{"class":144}," scrape",[106,357,149],{"class":148},[106,359,360],{"class":152},"urls",[106,362,156],{"class":148},[106,364,365],{"class":116}," list",[106,367,368],{"class":148},"[",[106,370,371],{"class":175},"str",[106,373,374],{"class":148},"])",[106,376,182],{"class":148},[106,378,365],{"class":116},[106,380,368],{"class":148},[106,382,371],{"class":175},[106,384,188],{"class":187},[106,386,192],{"class":191},[106,388,389],{"class":148},"]:\n",[106,391,393,396,399,401,403,405,407,410,412,414,417,420,422,424,427,430,432,435,437,439],{"class":108,"line":392},14,[106,394,395],{"class":112},"    async",[106,397,398],{"class":112}," with",[106,400,159],{"class":116},[106,402,95],{"class":148},[106,404,90],{"class":223},[106,406,149],{"class":148},[106,408,409],{"class":234},"headers",[106,411,212],{"class":187},[106,413,312],{"class":148},[106,415,330],{"class":416},"sjJ54",[106,418,419],{"class":308},"User-Agent",[106,421,330],{"class":416},[106,423,156],{"class":148},[106,425,426],{"class":416}," \"",[106,428,429],{"class":308},"Mozilla\u002F5.0",[106,431,330],{"class":416},[106,433,434],{"class":148},"})",[106,436,288],{"class":112},[106,438,218],{"class":116},[106,440,195],{"class":148},[106,442,444,446,448,451,453,456,458,461,463,466,468,470,472,475,477,480,483,486,489],{"class":108,"line":443},15,[106,445,264],{"class":112},[106,447,215],{"class":112},[106,449,450],{"class":116}," asyncio",[106,452,95],{"class":148},[106,454,455],{"class":223},"gather",[106,457,149],{"class":148},[106,459,460],{"class":187},"*",[106,462,149],{"class":148},[106,464,465],{"class":223},"fetch",[106,467,149],{"class":148},[106,469,153],{"class":223},[106,471,167],{"class":148},[106,473,474],{"class":223}," u",[106,476,179],{"class":148},[106,478,479],{"class":112}," for",[106,481,482],{"class":223}," u ",[106,484,485],{"class":112},"in",[106,487,488],{"class":223}," urls",[106,490,491],{"class":148},"))\n",[106,493,495],{"class":108,"line":494},16,[106,496,132],{"emptyLinePlaceholder":131},[106,498,500,503,505,508,510,513,515,518,520,523,525,528,530,533,535,538,540,543],{"class":108,"line":499},17,[106,501,502],{"class":116},"urls ",[106,504,212],{"class":187},[106,506,507],{"class":148}," [",[106,509,305],{"class":138},[106,511,512],{"class":308},"\"https:\u002F\u002Fbooks.toscrape.com\u002Fcatalogue\u002Fpage-",[106,514,312],{"class":240},[106,516,517],{"class":116},"i",[106,519,317],{"class":240},[106,521,522],{"class":308},".html\"",[106,524,479],{"class":112},[106,526,527],{"class":116}," i ",[106,529,485],{"class":112},[106,531,532],{"class":299}," range",[106,534,149],{"class":148},[106,536,537],{"class":240},"1",[106,539,167],{"class":148},[106,541,542],{"class":240}," 21",[106,544,545],{"class":148},")]\n",[106,547,549,552,554,556,558,561,563,566,568,570],{"class":108,"line":548},18,[106,550,551],{"class":116},"results ",[106,553,212],{"class":187},[106,555,450],{"class":116},[106,557,95],{"class":148},[106,559,560],{"class":223},"run",[106,562,149],{"class":148},[106,564,565],{"class":223},"scrape",[106,567,149],{"class":148},[106,569,360],{"class":223},[106,571,491],{"class":148},[106,573,575,578,580,582,585,587,590,592,595,598,601,603,605,608,610,613,615,617,620],{"class":108,"line":574},19,[106,576,577],{"class":299},"print",[106,579,149],{"class":148},[106,581,305],{"class":138},[106,583,584],{"class":308},"\"Fetched ",[106,586,312],{"class":240},[106,588,589],{"class":299},"sum",[106,591,149],{"class":148},[106,593,594],{"class":223},"r ",[106,596,597],{"class":112},"is",[106,599,600],{"class":112}," not",[106,602,192],{"class":191},[106,604,479],{"class":112},[106,606,607],{"class":223}," r ",[106,609,485],{"class":112},[106,611,612],{"class":223}," results",[106,614,179],{"class":148},[106,616,317],{"class":240},[106,618,619],{"class":308}," pages\"",[106,621,244],{"class":148},[14,623,624],{},"Reusing a single client matters: it pools connections, so you are not paying the TCP and TLS handshake cost on every request.",[45,626,628],{"id":627},"limiting-concurrency-the-semaphore","Limiting Concurrency: The Semaphore",[14,630,631,632,636,637,640],{},"The example above launches ",[633,634,635],"em",{},"all"," requests at once. Against twenty URLs that is fine; against twenty thousand it is a denial-of-service attack on the target and a guaranteed ban. The standard fix is an ",[18,638,639],{},"asyncio.Semaphore"," that caps how many requests run simultaneously.",[97,642,644],{"className":99,"code":643,"language":101,"meta":102,"style":102},"import asyncio\nimport httpx\n\nasync def fetch(client, url, semaphore):\n    async with semaphore:                 # only N run concurrently\n        response = await client.get(url, timeout=10)\n        response.raise_for_status()\n        await asyncio.sleep(0.2)          # small politeness delay\n        return response.text\n\nasync def scrape(urls, concurrency=10):\n    semaphore = asyncio.Semaphore(concurrency)\n    async with httpx.AsyncClient(headers={\"User-Agent\": \"Mozilla\u002F5.0\"}) as client:\n        tasks = [fetch(client, u, semaphore) for u in urls]\n        return await asyncio.gather(*tasks, return_exceptions=True)\n",[18,645,646,652,658,662,686,700,728,738,760,770,774,797,818,860,896],{"__ignoreMap":102},[106,647,648,650],{"class":108,"line":109},[106,649,113],{"class":112},[106,651,117],{"class":116},[106,653,654,656],{"class":108,"line":120},[106,655,113],{"class":112},[106,657,125],{"class":116},[106,659,660],{"class":108,"line":128},[106,661,132],{"emptyLinePlaceholder":131},[106,663,664,666,668,670,672,674,676,678,680,683],{"class":108,"line":135},[106,665,75],{"class":138},[106,667,141],{"class":138},[106,669,145],{"class":144},[106,671,149],{"class":148},[106,673,153],{"class":152},[106,675,167],{"class":148},[106,677,170],{"class":152},[106,679,167],{"class":148},[106,681,682],{"class":152}," semaphore",[106,684,685],{"class":148},"):\n",[106,687,688,690,692,694,696],{"class":108,"line":198},[106,689,395],{"class":112},[106,691,398],{"class":112},[106,693,682],{"class":116},[106,695,156],{"class":148},[106,697,699],{"class":698},"sutJx","                 # only N run concurrently\n",[106,701,702,704,706,708,710,712,714,716,718,720,722,724,726],{"class":108,"line":206},[106,703,209],{"class":116},[106,705,212],{"class":187},[106,707,215],{"class":112},[106,709,218],{"class":116},[106,711,95],{"class":148},[106,713,224],{"class":223},[106,715,149],{"class":148},[106,717,229],{"class":223},[106,719,167],{"class":148},[106,721,235],{"class":234},[106,723,212],{"class":187},[106,725,241],{"class":240},[106,727,244],{"class":148},[106,729,730,732,734,736],{"class":108,"line":247},[106,731,250],{"class":116},[106,733,95],{"class":148},[106,735,255],{"class":223},[106,737,258],{"class":148},[106,739,740,743,745,747,750,752,755,757],{"class":108,"line":261},[106,741,742],{"class":112},"        await",[106,744,450],{"class":116},[106,746,95],{"class":148},[106,748,749],{"class":223},"sleep",[106,751,149],{"class":148},[106,753,754],{"class":240},"0.2",[106,756,179],{"class":148},[106,758,759],{"class":698},"          # small politeness delay\n",[106,761,762,764,766,768],{"class":108,"line":275},[106,763,264],{"class":112},[106,765,267],{"class":116},[106,767,95],{"class":148},[106,769,272],{"class":164},[106,771,772],{"class":108,"line":296},[106,773,132],{"emptyLinePlaceholder":131},[106,775,776,778,780,782,784,786,788,791,793,795],{"class":108,"line":335},[106,777,75],{"class":138},[106,779,141],{"class":138},[106,781,355],{"class":144},[106,783,149],{"class":148},[106,785,360],{"class":152},[106,787,167],{"class":148},[106,789,790],{"class":152}," concurrency",[106,792,212],{"class":187},[106,794,241],{"class":240},[106,796,685],{"class":148},[106,798,799,802,804,806,808,811,813,816],{"class":108,"line":343},[106,800,801],{"class":116},"    semaphore ",[106,803,212],{"class":187},[106,805,450],{"class":116},[106,807,95],{"class":148},[106,809,810],{"class":223},"Semaphore",[106,812,149],{"class":148},[106,814,815],{"class":223},"concurrency",[106,817,244],{"class":148},[106,819,820,822,824,826,828,830,832,834,836,838,840,842,844,846,848,850,852,854,856,858],{"class":108,"line":348},[106,821,395],{"class":112},[106,823,398],{"class":112},[106,825,159],{"class":116},[106,827,95],{"class":148},[106,829,90],{"class":223},[106,831,149],{"class":148},[106,833,409],{"class":234},[106,835,212],{"class":187},[106,837,312],{"class":148},[106,839,330],{"class":416},[106,841,419],{"class":308},[106,843,330],{"class":416},[106,845,156],{"class":148},[106,847,426],{"class":416},[106,849,429],{"class":308},[106,851,330],{"class":416},[106,853,434],{"class":148},[106,855,288],{"class":112},[106,857,218],{"class":116},[106,859,195],{"class":148},[106,861,862,865,867,869,871,873,875,877,879,881,883,885,887,889,891,893],{"class":108,"line":392},[106,863,864],{"class":116},"        tasks ",[106,866,212],{"class":187},[106,868,507],{"class":148},[106,870,465],{"class":223},[106,872,149],{"class":148},[106,874,153],{"class":223},[106,876,167],{"class":148},[106,878,474],{"class":223},[106,880,167],{"class":148},[106,882,682],{"class":223},[106,884,179],{"class":148},[106,886,479],{"class":112},[106,888,482],{"class":116},[106,890,485],{"class":112},[106,892,488],{"class":116},[106,894,895],{"class":148},"]\n",[106,897,898,900,902,904,906,908,910,912,915,917,920,922,925],{"class":108,"line":443},[106,899,264],{"class":112},[106,901,215],{"class":112},[106,903,450],{"class":116},[106,905,95],{"class":148},[106,907,455],{"class":223},[106,909,149],{"class":148},[106,911,460],{"class":187},[106,913,914],{"class":223},"tasks",[106,916,167],{"class":148},[106,918,919],{"class":234}," return_exceptions",[106,921,212],{"class":187},[106,923,924],{"class":191},"True",[106,926,244],{"class":148},[14,928,929,932,933,935,936,95],{},[18,930,931],{},"return_exceptions=True"," ensures one failed request does not cancel the whole batch — failures come back as exception objects you can filter and retry. The semaphore plus a short ",[18,934,749],{}," gives you fast throughput while staying within a server's tolerance. This is the same politeness principle enforced automatically by ",[27,937,939],{"href":938},"\u002Fscaling-python-web-scrapers\u002Fweb-scraping-with-scrapy\u002F","Scrapy's AutoThrottle",[45,941,943],{"id":942},"adding-retries-with-backoff","Adding Retries with Backoff",[14,945,946,947,950,951,954],{},"Transient failures (",[18,948,949],{},"429",", ",[18,952,953],{},"503",", timeouts) are routine at scale. Wrap fetches in a retry loop with exponential backoff so a brief hiccup does not drop data.",[97,956,958],{"className":99,"code":957,"language":101,"meta":102,"style":102},"async def fetch_with_retry(client, url, semaphore, retries=3):\n    for attempt in range(retries):\n        try:\n            async with semaphore:\n                response = await client.get(url, timeout=10)\n                response.raise_for_status()\n                return response.text\n        except (httpx.HTTPStatusError, httpx.TransportError):\n            if attempt == retries - 1:\n                raise\n            await asyncio.sleep(2 ** attempt)   # 1s, 2s, 4s\n",[18,959,960,993,1012,1019,1030,1059,1070,1081,1108,1129,1134],{"__ignoreMap":102},[106,961,962,964,966,969,971,973,975,977,979,981,983,986,988,991],{"class":108,"line":109},[106,963,75],{"class":138},[106,965,141],{"class":138},[106,967,968],{"class":144}," fetch_with_retry",[106,970,149],{"class":148},[106,972,153],{"class":152},[106,974,167],{"class":148},[106,976,170],{"class":152},[106,978,167],{"class":148},[106,980,682],{"class":152},[106,982,167],{"class":148},[106,984,985],{"class":152}," retries",[106,987,212],{"class":187},[106,989,990],{"class":240},"3",[106,992,685],{"class":148},[106,994,995,998,1001,1003,1005,1007,1010],{"class":108,"line":120},[106,996,997],{"class":112},"    for",[106,999,1000],{"class":116}," attempt ",[106,1002,485],{"class":112},[106,1004,532],{"class":299},[106,1006,149],{"class":148},[106,1008,1009],{"class":223},"retries",[106,1011,685],{"class":148},[106,1013,1014,1017],{"class":108,"line":128},[106,1015,1016],{"class":112},"        try",[106,1018,195],{"class":148},[106,1020,1021,1024,1026,1028],{"class":108,"line":135},[106,1022,1023],{"class":112},"            async",[106,1025,398],{"class":112},[106,1027,682],{"class":116},[106,1029,195],{"class":148},[106,1031,1032,1035,1037,1039,1041,1043,1045,1047,1049,1051,1053,1055,1057],{"class":108,"line":198},[106,1033,1034],{"class":116},"                response ",[106,1036,212],{"class":187},[106,1038,215],{"class":112},[106,1040,218],{"class":116},[106,1042,95],{"class":148},[106,1044,224],{"class":223},[106,1046,149],{"class":148},[106,1048,229],{"class":223},[106,1050,167],{"class":148},[106,1052,235],{"class":234},[106,1054,212],{"class":187},[106,1056,241],{"class":240},[106,1058,244],{"class":148},[106,1060,1061,1064,1066,1068],{"class":108,"line":206},[106,1062,1063],{"class":116},"                response",[106,1065,95],{"class":148},[106,1067,255],{"class":223},[106,1069,258],{"class":148},[106,1071,1072,1075,1077,1079],{"class":108,"line":247},[106,1073,1074],{"class":112},"                return",[106,1076,267],{"class":116},[106,1078,95],{"class":148},[106,1080,272],{"class":164},[106,1082,1083,1086,1089,1092,1094,1097,1099,1101,1103,1106],{"class":108,"line":261},[106,1084,1085],{"class":112},"        except",[106,1087,1088],{"class":148}," (",[106,1090,1091],{"class":116},"httpx",[106,1093,95],{"class":148},[106,1095,1096],{"class":164},"HTTPStatusError",[106,1098,167],{"class":148},[106,1100,159],{"class":116},[106,1102,95],{"class":148},[106,1104,1105],{"class":164},"TransportError",[106,1107,685],{"class":148},[106,1109,1110,1113,1115,1118,1121,1124,1127],{"class":108,"line":275},[106,1111,1112],{"class":112},"            if",[106,1114,1000],{"class":116},[106,1116,1117],{"class":187},"==",[106,1119,1120],{"class":116}," retries ",[106,1122,1123],{"class":187},"-",[106,1125,1126],{"class":240}," 1",[106,1128,195],{"class":148},[106,1130,1131],{"class":108,"line":296},[106,1132,1133],{"class":112},"                raise\n",[106,1135,1136,1139,1141,1143,1145,1147,1150,1153,1156,1158],{"class":108,"line":335},[106,1137,1138],{"class":112},"            await",[106,1140,450],{"class":116},[106,1142,95],{"class":148},[106,1144,749],{"class":223},[106,1146,149],{"class":148},[106,1148,1149],{"class":240},"2",[106,1151,1152],{"class":187}," **",[106,1154,1155],{"class":223}," attempt",[106,1157,179],{"class":148},[106,1159,1160],{"class":698},"   # 1s, 2s, 4s\n",[45,1162,1164],{"id":1163},"async-vs-threads-vs-multiprocessing","Async vs Threads vs Multiprocessing",[1166,1167,1168,1174,1186],"ul",{},[1169,1170,1171,1173],"li",{},[53,1172,24],{}," — best for high-concurrency, I\u002FO-bound scraping. Hundreds of in-flight requests in one process, minimal overhead. Requires async-compatible libraries.",[1169,1175,1176,1088,1179,1182,1183,1185],{},[53,1177,1178],{},"Threads",[18,1180,1181],{},"concurrent.futures.ThreadPoolExecutor",") — good for moderate concurrency and when you must use synchronous libraries like ",[18,1184,20],{},". Simpler mental model, higher per-thread overhead.",[1169,1187,1188,1191],{},[53,1189,1190],{},"Multiprocessing"," — for CPU-bound stages such as parsing huge documents or running heavy regex; it sidesteps the GIL by using separate processes.",[14,1193,1194],{},"A common production shape is async fetching feeding a process pool for CPU-heavy parsing.",[45,1196,1198],{"id":1197},"common-mistakes-to-avoid","Common Mistakes to Avoid",[1166,1200,1201,1209,1226,1235,1246],{},[1169,1202,1203,1208],{},[53,1204,1205,1206,156],{},"Unbounded ",[18,1207,455],{}," launching every request at once overwhelms the target and your own machine. Always gate with a semaphore.",[1169,1210,1211,1214,1215,1218,1219,1221,1222,1225],{},[53,1212,1213],{},"Calling blocking code in a coroutine:"," ",[18,1216,1217],{},"time.sleep()"," or synchronous ",[18,1220,20],{}," inside async code freezes the event loop. Use ",[18,1223,1224],{},"asyncio.sleep"," and an async client.",[1169,1227,1228,1231,1232,1234],{},[53,1229,1230],{},"Creating a client per request:"," that discards connection pooling. Create one ",[18,1233,90],{}," and reuse it.",[1169,1236,1237,1240,1241,1243,1244,95],{},[53,1238,1239],{},"Letting one failure kill the batch:"," use ",[18,1242,931],{}," or per-task try\u002Fexcept so a single error does not cancel ",[18,1245,455],{},[1169,1247,1248,1251],{},[53,1249,1250],{},"Assuming async speeds up parsing:"," async only helps with waiting. CPU-bound parsing needs multiprocessing.",[45,1253,1255],{"id":1254},"frequently-asked-questions","Frequently Asked Questions",[14,1257,1258,1261,1262,1264,1265,1268],{},[53,1259,1260],{},"Should I use HTTPX or aiohttp?","\nBoth are excellent. HTTPX has a ",[18,1263,20],{},"-like API and supports both sync and async, making migration easy; ",[18,1266,1267],{},"aiohttp"," is async-only and battle-tested for high-throughput clients. Either is a solid choice.",[14,1270,1271,1274,1275,76,1277,1279],{},[53,1272,1273],{},"How many concurrent requests should I allow?","\nStart around 5–10 per domain, monitor for ",[18,1276,949],{},[18,1278,953],{}," responses, and increase only if the server tolerates it. The right number depends entirely on the target's capacity and rules.",[14,1281,1282,1285],{},[53,1283,1284],{},"Is async scraping always faster?","\nOnly for I\u002FO-bound work, which most scraping is. If your bottleneck is parsing or data processing rather than network waiting, async will not help — profile first.",[14,1287,1288,1291,1292,1294,1295,95],{},[53,1289,1290],{},"Can I use async with Scrapy?","\nScrapy is already asynchronous under the hood, so you typically do not manage ",[18,1293,24],{}," yourself — you configure concurrency through its settings. See ",[27,1296,1297],{"href":938},"Web Scraping with Scrapy",[1299,1300,1301],"style",{},"html pre.shiki code .sVHd0, html code.shiki .sVHd0{--shiki-light:#39ADB5;--shiki-light-font-style:italic;--shiki-default:#D73A49;--shiki-default-font-style:inherit;--shiki-dark:#F97583;--shiki-dark-font-style:inherit}html pre.shiki code .su5hD, html code.shiki .su5hD{--shiki-light:#90A4AE;--shiki-default:#24292E;--shiki-dark:#E1E4E8}html pre.shiki code .sbsja, html code.shiki .sbsja{--shiki-light:#9C3EDA;--shiki-default:#D73A49;--shiki-dark:#F97583}html pre.shiki code .sGLFI, html code.shiki .sGLFI{--shiki-light:#6182B8;--shiki-default:#6F42C1;--shiki-dark:#B392F0}html pre.shiki code .sP7_E, html code.shiki .sP7_E{--shiki-light:#39ADB5;--shiki-default:#24292E;--shiki-dark:#E1E4E8}html pre.shiki code .sFwrP, html code.shiki .sFwrP{--shiki-light:#90A4AE;--shiki-light-font-style:italic;--shiki-default:#24292E;--shiki-default-font-style:inherit;--shiki-dark:#E1E4E8;--shiki-dark-font-style:inherit}html pre.shiki code .skxfh, html code.shiki .skxfh{--shiki-light:#E53935;--shiki-default:#24292E;--shiki-dark:#E1E4E8}html pre.shiki code .sZMiF, html code.shiki .sZMiF{--shiki-light:#E2931D;--shiki-default:#005CC5;--shiki-dark:#79B8FF}html pre.shiki code .smGrS, html code.shiki .smGrS{--shiki-light:#39ADB5;--shiki-default:#D73A49;--shiki-dark:#F97583}html pre.shiki code .s39Yj, html code.shiki .s39Yj{--shiki-light:#39ADB5;--shiki-default:#005CC5;--shiki-dark:#79B8FF}html pre.shiki code .slqww, html code.shiki .slqww{--shiki-light:#6182B8;--shiki-default:#24292E;--shiki-dark:#E1E4E8}html pre.shiki code .s99_P, html code.shiki .s99_P{--shiki-light:#90A4AE;--shiki-light-font-style:italic;--shiki-default:#E36209;--shiki-default-font-style:inherit;--shiki-dark:#FFAB70;--shiki-dark-font-style:inherit}html pre.shiki code .srdBf, html code.shiki .srdBf{--shiki-light:#F76D47;--shiki-default:#005CC5;--shiki-dark:#79B8FF}html pre.shiki code .sptTA, html code.shiki .sptTA{--shiki-light:#6182B8;--shiki-default:#005CC5;--shiki-dark:#79B8FF}html pre.shiki code .s_sjI, html code.shiki .s_sjI{--shiki-light:#91B859;--shiki-default:#032F62;--shiki-dark:#9ECBFF}html pre.shiki code .sjJ54, html code.shiki .sjJ54{--shiki-light:#39ADB5;--shiki-default:#032F62;--shiki-dark:#9ECBFF}html .light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html.light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .sutJx, html code.shiki .sutJx{--shiki-light:#90A4AE;--shiki-light-font-style:italic;--shiki-default:#6A737D;--shiki-default-font-style:inherit;--shiki-dark:#6A737D;--shiki-dark-font-style:inherit}",{"title":102,"searchDepth":120,"depth":120,"links":1303},[1304,1305,1306,1307,1308,1309,1310],{"id":47,"depth":120,"text":48},{"id":83,"depth":120,"text":84},{"id":627,"depth":120,"text":628},{"id":942,"depth":120,"text":943},{"id":1163,"depth":120,"text":1164},{"id":1197,"depth":120,"text":1198},{"id":1254,"depth":120,"text":1255},"Speed up Python scrapers with async I\u002FO — concurrent requests using asyncio and HTTPX, semaphore-based rate limiting, retries, and when to use async over threads.","md",{},"\u002Fscaling-python-web-scrapers\u002Fasynchronous-scraping-with-asyncio-and-httpx",{"title":5,"description":1311},"scaling-python-web-scrapers\u002Fasynchronous-scraping-with-asyncio-and-httpx\u002Findex","cAlEdlU_1hYnoOw118jBMABJkmR7xktFYcHhlFRLGJM",[1319,1369,1396],{"title":1320,"path":1321,"stem":1322,"children":1323},"Advanced Scraping Techniques Anti Bot Evasion","\u002Fadvanced-scraping-techniques-anti-bot-evasion","advanced-scraping-techniques-anti-bot-evasion",[1324,1327,1333,1345,1357],{"title":1325,"path":1321,"stem":1326},"Advanced Python Scraping & Anti-Bot Evasion","advanced-scraping-techniques-anti-bot-evasion\u002Findex",{"title":1328,"path":1329,"stem":1330,"children":1331},"Bypass Cloudflare & Akamai with Python","\u002Fadvanced-scraping-techniques-anti-bot-evasion\u002Fbypassing-cloudflare-and-akamai-protections","advanced-scraping-techniques-anti-bot-evasion\u002Fbypassing-cloudflare-and-akamai-protections\u002Findex",[1332],{"title":1328,"path":1329,"stem":1330},{"title":1334,"path":1335,"stem":1336,"children":1337},"Mastering Selenium for Dynamic Websites","\u002Fadvanced-scraping-techniques-anti-bot-evasion\u002Fmastering-selenium-for-dynamic-websites","advanced-scraping-techniques-anti-bot-evasion\u002Fmastering-selenium-for-dynamic-websites\u002Findex",[1338,1339],{"title":1334,"path":1335,"stem":1336},{"title":1340,"path":1341,"stem":1342,"children":1343},"Python Selenium Stealth Setup Guide","\u002Fadvanced-scraping-techniques-anti-bot-evasion\u002Fmastering-selenium-for-dynamic-websites\u002Fhow-to-configure-selenium-stealth-to-avoid-detection","advanced-scraping-techniques-anti-bot-evasion\u002Fmastering-selenium-for-dynamic-websites\u002Fhow-to-configure-selenium-stealth-to-avoid-detection\u002Findex",[1344],{"title":1340,"path":1341,"stem":1342},{"title":1346,"path":1347,"stem":1348,"children":1349},"Rotating Proxies & Managing IP Blocks","\u002Fadvanced-scraping-techniques-anti-bot-evasion\u002Frotating-proxies-and-managing-ip-blocks","advanced-scraping-techniques-anti-bot-evasion\u002Frotating-proxies-and-managing-ip-blocks\u002Findex",[1350,1351],{"title":1346,"path":1347,"stem":1348},{"title":1352,"path":1353,"stem":1354,"children":1355},"Best Proxy Providers for Python Scrapers","\u002Fadvanced-scraping-techniques-anti-bot-evasion\u002Frotating-proxies-and-managing-ip-blocks\u002Fbest-free-and-paid-proxy-providers-for-scraping","advanced-scraping-techniques-anti-bot-evasion\u002Frotating-proxies-and-managing-ip-blocks\u002Fbest-free-and-paid-proxy-providers-for-scraping\u002Findex",[1356],{"title":1352,"path":1353,"stem":1354},{"title":1358,"path":1359,"stem":1360,"children":1361},"Playwright for Python Web Automation","\u002Fadvanced-scraping-techniques-anti-bot-evasion\u002Fusing-playwright-for-modern-web-automation","advanced-scraping-techniques-anti-bot-evasion\u002Fusing-playwright-for-modern-web-automation\u002Findex",[1362,1363],{"title":1358,"path":1359,"stem":1360},{"title":1364,"path":1365,"stem":1366,"children":1367},"Playwright vs Selenium: Python Benchmarks","\u002Fadvanced-scraping-techniques-anti-bot-evasion\u002Fusing-playwright-for-modern-web-automation\u002Fplaywright-vs-selenium-performance-benchmarks","advanced-scraping-techniques-anti-bot-evasion\u002Fusing-playwright-for-modern-web-automation\u002Fplaywright-vs-selenium-performance-benchmarks\u002Findex",[1368],{"title":1364,"path":1365,"stem":1366},{"title":1370,"path":1371,"stem":1372,"children":1373},"Scaling Python Web Scrapers","\u002Fscaling-python-web-scrapers","scaling-python-web-scrapers",[1374,1376,1379,1385],{"title":42,"path":1371,"stem":1375},"scaling-python-web-scrapers\u002Findex",{"title":5,"path":1314,"stem":1316,"children":1377},[1378],{"title":5,"path":1314,"stem":1316},{"title":1380,"path":1381,"stem":1382,"children":1383},"Storing and Exporting Scraped Data","\u002Fscaling-python-web-scrapers\u002Fstoring-and-exporting-scraped-data","scaling-python-web-scrapers\u002Fstoring-and-exporting-scraped-data\u002Findex",[1384],{"title":1380,"path":1381,"stem":1382},{"title":1297,"path":1386,"stem":1387,"children":1388},"\u002Fscaling-python-web-scrapers\u002Fweb-scraping-with-scrapy","scaling-python-web-scrapers\u002Fweb-scraping-with-scrapy\u002Findex",[1389,1390],{"title":1297,"path":1386,"stem":1387},{"title":1391,"path":1392,"stem":1393,"children":1394},"Scrapy vs BeautifulSoup: Which to Use","\u002Fscaling-python-web-scrapers\u002Fweb-scraping-with-scrapy\u002Fscrapy-vs-beautifulsoup-which-to-use","scaling-python-web-scrapers\u002Fweb-scraping-with-scrapy\u002Fscrapy-vs-beautifulsoup-which-to-use\u002Findex",[1395],{"title":1391,"path":1392,"stem":1393},{"title":1397,"path":1398,"stem":1399,"children":1400},"The Complete Guide To Python Web Scraping","\u002Fthe-complete-guide-to-python-web-scraping","the-complete-guide-to-python-web-scraping",[1401,1404,1416,1428,1434,1446,1458],{"title":1402,"path":1398,"stem":1403},"The Complete Python Web Scraping Guide","the-complete-guide-to-python-web-scraping\u002Findex",{"title":1405,"path":1406,"stem":1407,"children":1408},"Regex Data Extraction in Python Scraping","\u002Fthe-complete-guide-to-python-web-scraping\u002Fextracting-data-with-regular-expressions","the-complete-guide-to-python-web-scraping\u002Fextracting-data-with-regular-expressions\u002Findex",[1409,1410],{"title":1405,"path":1406,"stem":1407},{"title":1411,"path":1412,"stem":1413,"children":1414},"Fix Unicode Errors in Python Web Scraping","\u002Fthe-complete-guide-to-python-web-scraping\u002Fextracting-data-with-regular-expressions\u002Ffixing-common-unicode-errors-in-python-scraping","the-complete-guide-to-python-web-scraping\u002Fextracting-data-with-regular-expressions\u002Ffixing-common-unicode-errors-in-python-scraping\u002Findex",[1415],{"title":1411,"path":1412,"stem":1413},{"title":1417,"path":1418,"stem":1419,"children":1420},"Pagination & Infinite Scroll in Python","\u002Fthe-complete-guide-to-python-web-scraping\u002Fhandling-pagination-and-infinite-scroll","the-complete-guide-to-python-web-scraping\u002Fhandling-pagination-and-infinite-scroll\u002Findex",[1421,1422],{"title":1417,"path":1418,"stem":1419},{"title":1423,"path":1424,"stem":1425,"children":1426},"Scrape Static Sites Without Getting Blocked","\u002Fthe-complete-guide-to-python-web-scraping\u002Fhandling-pagination-and-infinite-scroll\u002Fhow-to-scrape-a-static-website-without-getting-blocked","the-complete-guide-to-python-web-scraping\u002Fhandling-pagination-and-infinite-scroll\u002Fhow-to-scrape-a-static-website-without-getting-blocked\u002Findex",[1427],{"title":1423,"path":1424,"stem":1425},{"title":1429,"path":1430,"stem":1431,"children":1432},"Managing Cookies & Sessions in Python","\u002Fthe-complete-guide-to-python-web-scraping\u002Fmanaging-cookies-and-sessions","the-complete-guide-to-python-web-scraping\u002Fmanaging-cookies-and-sessions\u002Findex",[1433],{"title":1429,"path":1430,"stem":1431},{"title":1435,"path":1436,"stem":1437,"children":1438},"Parsing HTML with BeautifulSoup in Python","\u002Fthe-complete-guide-to-python-web-scraping\u002Fparsing-html-with-beautifulsoup","the-complete-guide-to-python-web-scraping\u002Fparsing-html-with-beautifulsoup\u002Findex",[1439,1440],{"title":1435,"path":1436,"stem":1437},{"title":1441,"path":1442,"stem":1443,"children":1444},"BeautifulSoup vs lxml Speed Comparison","\u002Fthe-complete-guide-to-python-web-scraping\u002Fparsing-html-with-beautifulsoup\u002Fbeautifulsoup-vs-lxml-which-parser-is-faster","the-complete-guide-to-python-web-scraping\u002Fparsing-html-with-beautifulsoup\u002Fbeautifulsoup-vs-lxml-which-parser-is-faster\u002Findex",[1445],{"title":1441,"path":1442,"stem":1443},{"title":1447,"path":1448,"stem":1449,"children":1450},"Setting Up Your Python Scraping Environment","\u002Fthe-complete-guide-to-python-web-scraping\u002Fsetting-up-your-python-scraping-environment","the-complete-guide-to-python-web-scraping\u002Fsetting-up-your-python-scraping-environment\u002Findex",[1451,1452],{"title":1447,"path":1448,"stem":1449},{"title":1453,"path":1454,"stem":1455,"children":1456},"Install Python & Requests for Beginners","\u002Fthe-complete-guide-to-python-web-scraping\u002Fsetting-up-your-python-scraping-environment\u002Fhow-to-install-python-and-requests-for-beginners","the-complete-guide-to-python-web-scraping\u002Fsetting-up-your-python-scraping-environment\u002Fhow-to-install-python-and-requests-for-beginners\u002Findex",[1457],{"title":1453,"path":1454,"stem":1455},{"title":1459,"path":1460,"stem":1461,"children":1462},"HTTP Requests & Responses for Scrapers","\u002Fthe-complete-guide-to-python-web-scraping\u002Funderstanding-http-requests-and-responses","the-complete-guide-to-python-web-scraping\u002Funderstanding-http-requests-and-responses\u002Findex",[1463,1464],{"title":1459,"path":1460,"stem":1461},{"title":1465,"path":1466,"stem":1467,"children":1468},"Extract HTML Tables with Python","\u002Fthe-complete-guide-to-python-web-scraping\u002Funderstanding-http-requests-and-responses\u002Fstep-by-step-guide-to-extracting-tables-from-html","the-complete-guide-to-python-web-scraping\u002Funderstanding-http-requests-and-responses\u002Fstep-by-step-guide-to-extracting-tables-from-html\u002Findex",[1469],{"title":1465,"path":1466,"stem":1467},1781700487013]