lightpanda使用

1.缘起

阮一峰老师的<<科技爱好者周刊(第 336 期):面对 AI,互联网正在衰落>> 中提到了一个项目Lightpanda,

今天来一起看下这个项目怎么在爬虫中使用。

下载安装

Linux

1
wget https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-x86_64-linux | chmod a+x ./lightpanda-x86_64-linux |mv  lightpanda-x86_64-linux   lightpanda

Macos

1
2

wget https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-aarch64-macos | chmod a+x ./lightpanda-aarch64-macos |mv lightpanda-aarch64-macos lightpanda

只有Macos 和Linux 能用,下载后统一加权限改名

功能介绍

❯ ./lightpanda -h
usage: ./lightpanda [options] [URL]

start Lightpanda browser

  • if an url is provided the browser will fetch the page and exit
  • otherwhise the browser starts a CDP server

-h, –help Print this help message and exit.
–verbose Display all logs. By default only info, warn and err levels are displayed.
–host Host of the CDP server (default “127.0.0.1”)
–port Port of the CDP server (default “9222”)
–timeout Timeout for incoming connections of the CDP server (in seconds, default “3”)
–dump Dump document in stdout (fetch mode only)

使用时 lightpanda 后面如果只有选项,没有url, 比如 –dump,–timeout, 会启动CDP server
如果有url, 就会下载网页之后退出,这个一般应该不常用的.

在项目中使用

启动CDP 服务

1
./lightpanda --host 127.0.0.1 --port 9222

访问网址

1
2
3
4
5
6
7
8
9
10
11
12
13
import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
browserWSEndpoint: "ws://127.0.0.1:9222",
});

const context = await browser.createBrowserContext();
const page = await context.newPage();
await page.goto('https://baidu.com');
console.log(await page.content());
await page.close();
await context.close();

执行js

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
browserWSEndpoint: "ws://127.0.0.1:9222",
});

const context = await browser.createBrowserContext();
const page = await context.newPage();
await page.goto('https://baidu.com');

let data = await page.evaluate(() => {
const arr = [];
for (let i = 0; i < 10; i++) {
arr.push(i);
}
return arr;
})

console.log(data)

缺点

非常容易出现panic 导致程序退出,功能覆盖还不完全无法应用到正常的生产环境中去.

info(server): accepting new conn…
thread 1243245 panic: attempt to use null value
/home/runner/work/browser/browser/vendor/zig-js-runtime/src/engines/v8/callback.zig:0:0: 0x17b91d3 in call__anon_36850 (lightpanda)
/home/runner/work/browser/browser/vendor/zig-js-runtime/src/loop.zig:141:24: 0x176c348 in wrapper (lightpanda)
/opt/hostedtoolcache/zig/0.13.0/x64/lib/std/debug.zig:0:9: 0x1531192 in run_for_ns (lightpanda)
/home/runner/work/browser/browser/src/server.zig:476:31: 0x152f5e8 in handle (lightpanda)
/opt/hostedtoolcache/zig/0.13.0/x64/lib/std/Thread.zig:429:13: 0x15321fe in entryFn (lightpanda)
./nptl/pthread_create.c:442:8: 0x7f445177fac2 in start_thread (pthread_create.c)
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81:0: 0x7f445181184f in ??? (../sysdeps/unix/sysv/linux/x86_64/clone3.S)
???:?:?: 0x0 in ??? (???)
[1] 1243244 IOT instruction ./lightpanda


lightpanda使用
https://kingjem.github.io/2025/02/12/使用/
作者
Ruhai
发布于
2025年2月12日
许可协议