Welcome to curl_cffi’s documentation!

中文 README | Discuss on Telegram

curl_cffi is a Python binding for curl-impersonate via cffi.

Unlike other pure Python http clients like httpx or requests, curl_cffi can impersonate browsers’ TLS signatures or JA3 fingerprints. If you are blocked by some website for no obvious reason, you can give this package a try.


Scrapfly

Scrapfly is an enterprise-grade solution providing Web Scraping API that aims to simplify the scraping process by managing everything: real browser rendering, rotating proxies, and fingerprints (TLS, HTTP, browser) to bypass all major anti-bots. Scrapfly also unlocks the observability by providing an analytical dashboard and measuring the success rate/block rate in details.

Scrapfly is a good solution if you are looking for a cloud-managed solution for curl_cffi. If you are managing TLS/HTTP fingerprint by yourself with curl_cffi, they also maintain this tool to convert curl command into python curl_cffi code!


Features

  • Supports JA3/TLS and http2 fingerprints impersonation.

  • Much faster than requests/httpx, on par with aiohttp/pycurl, see benchmarks.

  • Mimics requests API, no need to learn another one.

  • Pre-compiled, so you don’t have to compile on your machine.

  • Supports asyncio with proxy rotation on each request.

  • Supports http 2.0, which requests does not.

  • Supports websocket.

Feature matrix

requests

aiohttp

httpx

pycurl

curl_cffi

http2

sync

async

websocket

fingerprints

speed

🐇

🐇🐇

🐇

🐇🐇

🐇🐇

Install

pip install curl_cffi --upgrade

For more details, see Install.

Basic Usage

requests-like

from curl_cffi import requests

url = "https://tls.browserleaks.com/json"

# Notice the impersonate parameter
r = requests.get(url, impersonate="chrome")

print(r.json())
# output: {..., "ja3n_hash": "aa56c057ad164ec4fdcb7a5a283be9fc", ...}
# the js3n fingerprint should be the same as target browser

# http/socks proxies are supported
proxies = {"https": "http://localhost:3128"}
r = requests.get(url, impersonate="chrome", proxies=proxies)

proxies = {"https": "socks://localhost:3128"}
r = requests.get(url, impersonate="chrome", proxies=proxies)

Sessions

s = requests.Session()

# httpbin is a http test website
s.get("https://httpbin.org/cookies/set/foo/bar")

print(s.cookies)
# <Cookies[<Cookie foo=bar for httpbin.org />]>

r = s.get("https://httpbin.org/cookies")
print(r.json())
# {'cookies': {'foo': 'bar'}}

asyncio

from curl_cffi.requests import AsyncSession

async with AsyncSession() as s:
    r = await s.get("https://example.com")

More concurrency:

import asyncio
from curl_cffi.requests import AsyncSession

urls = [
    "https://google.com/",
    "https://facebook.com/",
    "https://twitter.com/",
]

async with AsyncSession() as s:
    tasks = []
    for url in urls:
        task = s.get(url)
        tasks.append(task)
    results = await asyncio.gather(*tasks)

WebSockets

from curl_cffi.requests import Session, WebSocket

def on_message(ws: WebSocket, message):
    print(message)

with Session() as s:
    ws = s.ws_connect(
        "wss://api.gemini.com/v1/marketdata/BTCUSD",
        on_message=on_message,
    )
    ws.run_forever()

Indices and tables