반응형

HTML 파싱

dependency

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.11.3</version>
</dependency>

파싱할 HTML 본문

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>
    <div id="a" title="a">aaa</div>
    <div class="b" title="b">bbb</div>
</body>
</html>

Code

package com.example.html;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class ParseHtmlApp {
    public static void main(String[] args) {
        String html = "<!DOCTYPE html>\n" +
                "<html lang=\"en\">\n" +
                "<head>\n" +
                "\t<meta charset=\"UTF-8\">\n" +
                "\t<title>Title</title>\n" +
                "</head>\n" +
                "<body>\n" +
                "\t<div id=\"a\" title=\"a\">aaa</div>\n" +
                "\t<div class=\"b\" title=\"b\">bbb</div>\n" +
                "</body>\n" +
                "</html>";

        Document document = Jsoup.parse(html);

        Elements divElements = document.select("div");
        for (Element divElement : divElements) {
            System.out.println(divElement.text());
        }

        Element divElement = document.selectFirst("div#a");
        String title = divElement.attr("title");
        System.out.println(title);
    }
}

jsoup element 조작하기

자식 제거

document.selectFirst("div *").remove();
document.selectFirst("div").empty();

자식 추가

Element div = new Element("div").text("I'm a child");

document.selectFirst("div").appendChild(div);
반응형

'Development > Java' 카테고리의 다른 글

[Java] XML 파싱  (0) 2019.03.20
[Java] Json Handling(with Jackson)  (0) 2019.03.20
[Java] Garbage Collector  (0) 2018.07.24
[Java] JDK 설치  (0) 2018.07.22
[Java] logback, slf4j  (0) 2018.07.15

+ Recent posts