lab.naminsik

AWS elasticsearch - 기존 맵핑 필드에 analysis, filter, analyzer 변경 또는 추가하기 위한 과정 + tokenizer

기존에 만들어 놓은 인덱스 중 필드 하나에 analyzer 를 추가하고 싶었다.
한글 형태 검색이 필요했기 때문이다.
그러나 기존에 만들어진 맵핑된 필드라면 변경이 힘들다.

그 과정을 정리해보려고 한다.

목표 : 기존 tb_schoolbasicinformation-develop 이라는 인덱스에서 schul_nm 이라는 필드에 analyzer 를 추가하려고 한다.

1. 맵핑 조회

GET tb_schoolbasicinformation-develop/_mapping

결과 (AS-IS) :

... 중략

"schul_nm" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },

중략 ...

(TO-BE) 이 상태에서 내가 원하는 것은 아래와 같은 모습이다.

... 중략

"schul_nm" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          },
          "fielddata": true,
          "analyzer": "korean"
        },

중략 ...

위처럼 analyzer 에 korean 을 추가하는 것이다.
물론 korean 역시 tb_schoolbasicinformation-develop 인덱스 자체에 설정을 우선 할 예정이다.

2. 인덱스에 analyzer 추가를 위해 tb_schoolbasicinformation-develop 인덱스 셋팅 조회

GET tb_schoolbasicinformation-develop/_settings

결과 :

{
  "tb_schoolbasicinformation-develop" : {
    "settings" : {
      "index" : {
        "creation_date" : "1707991821167",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "SADFSDFDASFDSAF",
        "version" : {
          "created" : "7100299"
        },
        "provided_name" : "tb_schoolbasicinformation-develop"
      }
    }
  }
}

3. 인덱스에 analyzer 추가

PUT tb_schoolbasicinformation-develop/_settings
{
    "analysis": {
      "filter": {
          "edge_ngram": {
            "type": "edge_ngram",
            "min_gram": 1,
            "max_gram": 2,
            "token_chars": ["letter", "digit"]
          }
      },
      "tokenizer" : {
        "seunjeon" : {
          "type" : "seunjeon_tokenizer",
          "index_eojeol": false,
          "index_poses": ["UNK","EP","I","J","M","N","SL","SH","SN","VCP","XP","XS","XR"],
          "decompound": true
        }
      },
      "analyzer": {
        "korean": {
          "type": "custom",
          "tokenizer": "seunjeon",
          "filter": [
              "edge_ngram"
          ]
        }
      }
    }
  }

플러그인 은전한잎으로 tokenizer 설정하여 analyzer 에 korean을 추가하였고,
filter에 edge_ngram 을 설정하였다.
이 edge_ngram 필터는 analysis 아래 filter 에 추가해 놓은 필터이다.

이 상태로 실행하면 아래와 같은 오류가 난다.

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Can't update non dynamic settings [[index.analysis.analyzer.korean.tokenizer, index.analysis.filter.edge_ngram.min_gram, index.analysis.tokenizer.seunjeon.type, index.analysis.filter.edge_ngram.token_chars, index.analysis.tokenizer.seunjeon.decompound, index.analysis.analyzer.korean.filter, index.analysis.filter.edge_ngram.max_gram, index.analysis.analyzer.korean.type, index.analysis.tokenizer.seunjeon.index_eojeol, index.analysis.tokenizer.seunjeon.index_poses, index.analysis.filter.edge_ngram.type]] for open indices [[tb_schoolbasicinformation-develop/asdf]]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Can't update non dynamic settings [[index.analysis.analyzer.korean.tokenizer, index.analysis.filter.edge_ngram.min_gram, index.analysis.tokenizer.seunjeon.type, index.analysis.filter.edge_ngram.token_chars, index.analysis.tokenizer.seunjeon.decompound, index.analysis.analyzer.korean.filter, index.analysis.filter.edge_ngram.max_gram, index.analysis.analyzer.korean.type, index.analysis.tokenizer.seunjeon.index_eojeol, index.analysis.tokenizer.seunjeon.index_poses, index.analysis.filter.edge_ngram.type]] for open indices [[tb_schoolbasicinformation-develop/asdf]]"
  },
  "status" : 400
}

4. 설정을 하기 위해서 해당 인덱스를 닫는다.

POST tb_schoolbasicinformation-develop/_close

위처럼 tb_schoolbasicinformation-develop 인덱스를 닫아준다.

결과 :

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "indices" : {
    "tb_schoolbasicinformation-develop" : {
      "closed" : true
    }
  }
}

정상적으로 닫아졌다고 나온다.

이제 다시 3번의 'PUT tb_schoolbasicinformation-develop/_settings' 을 실행하면 아래와 같이 정상 추가되어 결과값이 노출된다.

결과 :

{
  "acknowledged" : true
}

정상적으로 처리되었다는 결과 코드다.

5. 설정이 끝났으니 다시 인덱스를 연다.

POST tb_schoolbasicinformation-develop/_open

결과 :

{
  "acknowledged" : true,
  "shards_acknowledged" : true
}

위와 같이 정상적으로 열렸다고 나온다.

6. 이제 1번에서 진행하려 했던 TO-BE 작업인 "analyzer": "korean" 를 추가하는 작업이다.

먼저 매핑을 수정해보자.

PUT tb_schoolbasicinformation-develop/_mapping
{
  "properties": {
    "schul_nm": { 
      "type": "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      },
      "fielddata": true,
      "analyzer": "korean"
    }
  }
}

결과 :

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Mapper for [schul_nm] conflicts with existing mapper:\n\tCannot update parameter [analyzer] from [default] to [korean]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Mapper for [schul_nm] conflicts with existing mapper:\n\tCannot update parameter [analyzer] from [default] to [korean]"
  },
  "status" : 400
}

이렇게 오류가 난다.

이유는 기존에 만들어져 있는 맵핑 필드인 schul_nm 을 수정할 수 없는 것이다.

그렇기 때문에 편법을 사용해야 하는 상황인데,
빈 인덱스 B를 하나 만들고 거기에 미리 analysis 셋팅 값 설정과 맵핑 필드 바꾸려 했던 매핑 설정을 미리 해 놓고 원본 인덱스 A를 새로 만든 빈 인덱스 B로 리인덱스 한다.
(복사 붙여넣기라 생각 하면 됨)

그리고 원본 인덱스인 A를 지우고 리인덱스한 B를 다시 A라는 이름으로 리인덱스한다. ( 또 복사 붙여넣기를 한 것이다.)
그러면 A, B 인덱스가 존재하고 양쪽다 우리가 지금까지 바꾸려고 했던 값들로 설정된 것이다.

이제 필요없는 B 인덱스를 지우면 끝이다.

간략 요약하자면,
A-1. B 라는 빈 인덱스 생성
A-2. B 인덱스에 analysis 셋팅, 맵핑 설정
A-3. A 에서 B로 리인덱스
A-4. A 인덱스 삭제
A-5. B 에서 A로 리인덱스
A-6. B 인덱스 삭제

그 과정을 이제 예시로 보자.

7. 빈 인덱스를 하나 생성한다. (A-1)

PUT tb_schoolbasicinformation-develop2

정상적으로 생성되었다면 아래처럼 결과 노출된다.

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "tb_schoolbasicinformation-develop2"
}

8. analysis 셋팅을 하고 맵핑 설정도 하고 리인덱스 하는 (A-2 ~ A-3) 과정을 아래처럼 진행한다.

POST tb_schoolbasicinformation-develop2/_close

PUT tb_schoolbasicinformation-develop2/_settings
{
    "analysis": {
      "filter": {
          "edge_ngram": {
            "type": "edge_ngram",
            "min_gram": 1,
            "max_gram": 10,
            "token_chars": ["letter", "digit"]
          }
      },
      "tokenizer" : {
        "seunjeon" : {
          "type" : "seunjeon_tokenizer",
          "index_eojeol": false,
          "index_poses": ["UNK","EP","I","J","M","N","SL","SH","SN","VCP","XP","XS","XR"],
          "decompound": true
        }
      },
      "analyzer": {
        "korean": {
          "type": "custom",
          "tokenizer": "seunjeon",
          "filter": [
              
          ]
        }
      }
    }
  }
  
POST tb_schoolbasicinformation-develop2/_open

PUT tb_schoolbasicinformation-develop2/_mapping
{
  "properties": {
    "schul_nm": { 
      "type": "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      },
      "fielddata": true,
      "analyzer": "korean"
    }
  }
}

POST _reindex
{
  "source": {
    "index": "tb_schoolbasicinformation-develop"
  },
  "dest": {
    "index": "tb_schoolbasicinformation-develop2"
  }
}

결과가 정상적으로 처리되었다면 아래처럼 나온다.

{
  "took" : 10022,
  "timed_out" : false,
  "total" : 13388,
  "updated" : 0,
  "created" : 13388,
  "deleted" : 0,
  "batches" : 14,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

한번 제대로 적용되었는 지 확인해보자.

GET tb_schoolbasicinformation-develop2/_mapping

        "schul_nm" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          },
          "analyzer" : "korean",
          "fielddata" : true
        }

내용 중에 "analyzer": "korean" 을 정상적으로 들어가 있음을 확인하였다.

9. 이제 원본 인덱스를 삭제해 준다. (A-4)

DELETE tb_schoolbasicinformation-develop

{
  "acknowledged" : true
}

정상적으로 처리되었다.

10. 다시 tb_schoolbasicinformation-develop2 를 tb_schoolbasicinformation-develop 로 리인덱스 해주어 tb_schoolbasicinformation-develop 에도 tb_schoolbasicinformation-develop2의 설정과 내용들이 다 들어가 있게 해준다. (A-5)

POST _reindex
{
  "source": {
    "index": "tb_schoolbasicinformation-develop2"
  },
  "dest": {
    "index": "tb_schoolbasicinformation-develop"
  }
}

{
  "took" : 7915,
  "timed_out" : false,
  "total" : 13388,
  "updated" : 0,
  "created" : 13388,
  "deleted" : 0,
  "batches" : 14,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

total 값을 보니 모두 정상적으로 리인덱스 되었다.

11. 이제 불필요해진 인덱스 tb_schoolbasicinformation-develop2 를 지워준다. (A-6)

DELETE tb_schoolbasicinformation-develop2

{
  "acknowledged" : true
}

이제 남은 인덱스는 tb_schoolbasicinformation-develop 이고 _settings 또는 _mapping 으로 조회하면 내가 바꾸려 했던 대로 잘 바뀌어져 있음을 확인할 수 있다.

이 사이트는 스팸을 줄이는 아키스밋을 사용합니다. 댓글이 어떻게 처리되는지 알아보십시오.

0 댓글

Inline Feedbacks

View all comments

TOP

AWS elasticsearch - 기존 맵핑 필드에 analysis, filter, analyzer 변경 또는 추가하기 위한 과정 + tokenizer

목표 : 기존 tb_schoolbasicinformation-develop 이라는 인덱스에서 schul_nm 이라는 필드에 analyzer 를 추가하려고 한다.

related post

related

popular

comment