Batch download images from Xiaohongshu
- Open Firefox developer tools.
- Search the HTML class name
change-pic
and delete it. - Press
C-S-c
to re-pick the HTML element containing the image area. - Select the parent HTML node of the currently focused node, which is something like
<ul class="slide" data-v-f3680c6c="">
. - Press
C-c
to copy the HTML code of the node and save it to a file. -
Use a combination of several commands,
grep
,sed
,xargs
andwget
, to extract image URLs and download all of them.grep -P -o '(?<=url\().*?(?=\))' album.html | sed "s/"/\"/g" |sed "s/\/\//https:\/\//" | xargs wget
N.B. In the pipe-concatenated commands above,
grep
uses the option-P
to enable Perl compatible regular expressions. Its option-o
will output only the matched patterns. In the regular expression(?<=url\().*?(?=\))
,(?<=url\()
is a positive look-behind assertion, while(?=\))
is a positive look-ahead assertion (reference)..*?
matches any characters between a pair of brackets()
in a non-greedy mode. The total effect of this regular expression is to extract the url from the string below.url("//ci.xiaohongshu.com/8f9e2cf5-ea58-ef7f-b3bb-65e8ede2798c?imageView2/2/w/1080/format/jpg")
sed
first replaces HTML quote"
with"
and then//
withhttps://
. Finally,xargs
passes each line of extracted URL towget
which downloads the image.