반응형
HTML 파싱
dependency
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>
파싱할 HTML 본문
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<div id="a" title="a">aaa</div>
<div class="b" title="b">bbb</div>
</body>
</html>
Code
package com.example.html;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class ParseHtmlApp {
public static void main(String[] args) {
String html = "<!DOCTYPE html>\n" +
"<html lang=\"en\">\n" +
"<head>\n" +
"\t<meta charset=\"UTF-8\">\n" +
"\t<title>Title</title>\n" +
"</head>\n" +
"<body>\n" +
"\t<div id=\"a\" title=\"a\">aaa</div>\n" +
"\t<div class=\"b\" title=\"b\">bbb</div>\n" +
"</body>\n" +
"</html>";
Document document = Jsoup.parse(html);
Elements divElements = document.select("div");
for (Element divElement : divElements) {
System.out.println(divElement.text());
}
Element divElement = document.selectFirst("div#a");
String title = divElement.attr("title");
System.out.println(title);
}
}
jsoup element 조작하기
자식 제거
document.selectFirst("div *").remove();
document.selectFirst("div").empty();
자식 추가
Element div = new Element("div").text("I'm a child");
document.selectFirst("div").appendChild(div);
반응형
'Development > Java' 카테고리의 다른 글
[Java] XML 파싱 (0) | 2019.03.20 |
---|---|
[Java] Json Handling(with Jackson) (0) | 2019.03.20 |
[Java] Garbage Collector (0) | 2018.07.24 |
[Java] JDK 설치 (0) | 2018.07.22 |
[Java] logback, slf4j (0) | 2018.07.15 |