[Elasticsearch] Python에서 엘라스틱서치 client 사용하기

티스토리 뷰

Study/Ect

[Elasticsearch] Python에서 엘라스틱서치 client 사용하기

코딩하는 앤지 2024. 7. 16.

728x90

Elasticsearch

엘라스틱서치(Elasticsearch)는 아파치 루씬(Apache Lucene)을 기반으로 개발된 오픈소스 분산 검색 엔진이다.
루씬의 역색인(inverted index) 구조를 활용하여 문서 내 단어들을 빠르게 색인하고 검색할 수 있다.
엘라스틱서치는 검색 엔진 기능 외에도 로그 분석과 실시간 모니터링 등에 사용된다.
특히 Logstash와 Kibana와 함께 사용되어 데이터 수집, 저장, 분석, 시각화를 제공할 수 있다.
Elasticsearch Python Client 사용하면 Python 애플리케이션에서 Elasticsearch를 쉽게 통합할 수 있다.

1) Elasticsearch 설치

Elasticsearch Python Client를 설치하려면 pip를 사용하면 된다.

pip install elasticsearch

2) Elasticsearch Client 노드 연결하기

from elasticsearch import Elasticsearch

# 첫 ID는 'elastic'이며, 비밀번호는 처음 주어진 비밀번호를 변경가능함
ELASTIC_ID = "elastic"
ELASTIC_PASSWORD = "<password>"

# client instance 생성
client = Elasticsearch(
    "https://localhost:9200",   # endpoint
    ca_certs="/path/to/http_ca.crt",
    basic_auth=(ELASTIC_ID, ELASTIC_PASSWORD)
)

# 접속이 잘 되었다면 아래 코드로 확인가능함
client.info()
# {'name': 'instance-0000000000', 'cluster_name': ...}

처음 엘라스틱서치를 설치하고 실행하면 다음과 같은 멘트를 볼 수 있다.

----------------------------------------------------------------
-> Elasticsearch security features have been automatically configured!
-> Authentication is enabled and cluster connections are encrypted.

->  Password for the elastic user (reset with `bin/elasticsearch-reset-password -u elastic`):
  lhQpLELkjkrawaBoaz0Q

->  HTTP CA certificate SHA-256 fingerprint:
  a52dd93511e8c6045e21f16654b77c9ee0f34aea26d9f40320b531c474676228
...
----------------------------------------------------------------

맨 처음 시도했을 때만 보임으로 비밀번호를 잘 체크하는 것이 중요하다!
ca_certs 주소의 http_ca.crt 파일은 엘라스틱서치 폴더 위치로 이동하여 config/certs 폴더 내에서 확인 가능하다.
ex) elasticsearch-8.15.0/config/certs/http_ca.crt

3) Index(인덱스) 생성하기

Elasticsearch에서 인덱스는 MySQL 데이터베이스의 테이블과 같다.

client.indices.create(index="my_index")

# 결과
{
    'acknowledged': True, 
    'shards_acknowledged': True, 
    'index': 'my_index'
}

삭제도 간단하게 가능하다.

client.indices.delete(index="my_index")

# 결과
{'acknowledged': True}

4) Documents(문서) 관리하기

MySQL의 테이블에서는 각 행(row)이 개별적인 레코드로 관리되지만, Elasticsearch에서는 문서들이 통으로 역색인되어 관리다.

(1) 문서 생성하기

client.index(
    index="my_index",
    id="my_document_id",
    document={
        "field1": "value1",
        "field2": "value2",
    }
)

Elasticsearch의 문서 ID는 MySQL에서 각 레코드(행)을 구분하는 기본키(Primary Key)에 해당된다.
Elasticsearch의 문서는 여러 개의 field-value 쌍으로 구성되며, 이는 MySQL의 테이블 구조에서 열(column)과 값(value)에 대응된다.

# 문서 생성 결과
{
    "_shards": {
        "total": 2,
        "failed": 0,
        "successful": 2
  },
    "_index": "my_index",
    "_id": "my_document_id",
    "_version": 1,
    "_seq_no": 0,
    "_primary_term": 1,
    "result": "created"
}

(2) 문서 조회하기

client.get(index="my_index", id="my_document_id")

어떤 인덱스의 어떤 id를 가진 문서를 조회할지 입력하면 문서를 가져온다.
조회 실패시 404 에러가 발생한다.
- NotFoundError(404, "{'_index': 'my_index', '_id': '0', 'found': False}")

# 문서 조회 결과
{
    "_index": "my_index",
    "_id": "my_document_id",
    "_version": 1,
    "_seq_no": 0,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "field1": "value1",
        "field2": "value2",
    }
}

(3) 문서 검색하기

resp = client.search(index="my_index", query={
    "match": {
        "field1": "value1"
    }
})

search는 여러개의 문서들을 검색한다.
"my_index" 인덱스에 쿼리를 날려 문서를 검색할 수 있다.
"match"는 쿼리 종류를 의미한다.
- match: 풀 텍스트에서 해당 단어를 검색한다. (OR 검색)
- match_all: 모든 문서를 검색한다.
검색 개수를 제한하려면 search 함수의 매개변수로 size = n을 추가할 수 있다.
예시에서는 field1에 value1을 포함하는 문서들이 검색되며, 해당 문서들은 resp['hits']['hits'] 안에 리스트 형태로 문서들이 담겨있다.

# 문서 검색 결과
{
    'took': 98, 
    'timed_out': False, 
    '_shards': {
    	'total': 1, 
        'successful': 1, 
        'skipped': 0, 
        'failed': 0
    }, 
    'hits': {
    	'total': {
            'value': 1,   # 검색된 문서 개수
            'relation': 'eq'
        }, 
        'max_score': 0.2876821, 
        'hits': [{
            '_index': 'my_index', 
            '_id': 'my_document_id', 
            '_score': 0.2876821, 
            '_source': {
                'field1': 'value1', 
                'field2': 'value2'
            }
        }]
    }
}

(4) 문서 수정하기

client.update(index="my_index", id="my_document_id", doc={
    "field1": "new value1",
    "new_field": "new value2",
})

"my_index" 인덱스의 문서 ID가 "my_document_id"인 문서를 찾아 doc의 내용으로 수정한다.
"field1": "new value1" 코드는 "field1" 필드의 값을 "new value1"로 변경하는 것을 의미한다.
"new_field": "new value2" 코드는 "new_field"라는 새로운 필드를 추가하고, 그 값을 "new value2"로 설정하는 것을 의미한다.

# 문서 수정 결과
{
    '_index': 'my_index', 
    '_id': 'my_document_id', 
    '_version': 1,
    'result': 'updated',
    '_shards': {
        'total': 2, 
        'successful': 1, 
        'failed': 0
    },
    '_seq_no': 1, 
    '_primary_term': 1
}

# 동일한 내용으로 수정이 이루어지면 다음과 같은 noop 결과가 나옴
{
    '_index': 'my_index', 
    '_id': 'my_document_id', 
    '_version': 1, 
    'result': 'noop', 
    '_shards': {
        'total': 0, 
        'successful': 0, 
        'failed': 0
    }, 
    '_seq_no': 1, 
    '_primary_term': 1
}

(5) 문서 삭제하기

client.delete(index="my_index", id="my_document_id")

문서 ID를 통해 문서를 삭제한다.

# 문서 삭제 결과
{
  "_shards": {
    "total": 2,
    "failed": 0,
    "successful": 1
  },
  "_index": "my_index",
  "_id": "my_document_id",
  "_version": 2,
  "_primary_term": 1,
  "_seq_no": 5,
  "result": "deleted"
}

참고문헌

https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/getting-started-python.html

728x90

'Study > Ect' 카테고리의 다른 글

[Git] 브랜치로 프로젝트 관리하기 (0)	2024.10.26
[MySQL] MySQL 서버 수동으로 켜고 끄기 (0)	2024.07.18
[VSCode] MySQL Extension으로 DB 연결하기 (0)	2024.07.08
[MySQL] MySQL 설치하기 (윈도우 / windows) (4)	2024.07.05
[Tool] 소스 코드 비교 사이트 (0)	2023.06.21

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

TAG more

링크

CodeAngie

티스토리 뷰

[Elasticsearch] Python에서 엘라스틱서치 client 사용하기

Elasticsearch

1) Elasticsearch 설치

2) Elasticsearch Client 노드 연결하기

3) Index(인덱스) 생성하기

4) Documents(문서) 관리하기

(1) 문서 생성하기

(2) 문서 조회하기

(3) 문서 검색하기

(4) 문서 수정하기

(5) 문서 삭제하기

'Study > Ect' 카테고리의 다른 글

티스토리툴바